ADR-023: Multi-Router Gateway Support (OpenRouter, Requesty, Direct APIs)¶
Status: COMPLETE (All Gateways Implemented v0.12.2) Date: 2025-12-22 Decision Makers: Engineering, Architecture Council Review: Completed - All 4 models responded (Reasoning Tier) Layer Assignment: Layer 4 - Gateway Routing (per ADR-024)
Layer Context (ADR-024)¶
This ADR operates at Layer 4 in the unified routing architecture:
| Layer | ADR | Responsibility |
|---|---|---|
| L1 | ADR-022 | Tier Selection (quick/balanced/high/reasoning) |
| L2 | ADR-020 | Query Triage & Model Selection |
| L3 | Core | Council Execution (Stage 1-3) |
| L4 | ADR-023 | Gateway Routing |
Interaction Rules:
- Layer 4 receives resolved model IDs and CanonicalMessage from Layer 3
- Gateway selection is based on model → gateway mapping
- Gateway fallback is for infrastructure failures only (timeout, 5xx, rate limit)
- Gateway failures NEVER trigger tier escalation (per ADR-024 council decision)
- All gateways exhausted → raise TransportFailure with clear error
Terminology Note¶
This ADR uses "gateway routing" to describe infrastructure-level decisions about which API endpoint services a model request. This is distinct from: - Model selection/triage (ADR-020): Semantic decisions about which model to use based on query complexity - Tier selection (ADR-022): Choosing model pools based on confidence level
The term "routing" in this ADR always refers to gateway/provider selection, not model selection.
Context¶
LLM Council currently has a hardcoded dependency on OpenRouter as the sole gateway for model access. While OpenRouter provides excellent multi-model access, this creates several limitations:
Current Architecture Limitations¶
# config.py - Hardcoded OpenRouter dependency
OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions"
OPENROUTER_API_KEY = _get_api_key() # Single key, single provider
| Issue | Impact |
|---|---|
| Single Point of Failure | OpenRouter outage = complete council failure |
| No Gateway Choice | Users cannot leverage Requesty, direct APIs, or other routers |
| Cost Optimization Limited | Cannot route to cheaper gateways for specific models |
| BYOK Constraints | No support for using personal API keys directly |
| Enterprise Deployment | Some organizations cannot use third-party routers |
Gateway Comparison: OpenRouter vs Requesty¶
Based on comprehensive documentation review, here is a detailed comparison:
| Feature | OpenRouter | Requesty |
|---|---|---|
| Base URL | https://openrouter.ai/api/v1 |
https://router.requesty.ai/v1 |
| Model Count | "Hundreds" | 300+ |
| API Compatibility | OpenAI-compatible | OpenAI + Anthropic compatible |
| Pricing Model | Credit-based, passthrough | Passthrough + usage-based |
| Smart Routing | Basic failover | Latency-based + load balancing |
| Auto-Failover | Yes (implicit) | Yes (configurable chains) |
| Caching | Not documented | Auto-caching with toggle |
| BYOK | No | Yes (bring your own keys) |
| Cost Analytics | Basic dashboard | Rich analytics + budgets |
| Enterprise Features | Limited | RBAC, SSO, SOC2 |
| Claude Code Integration | Via OpenAI SDK | Native integration |
Requesty Unique Capabilities¶
-
Fallback Chains: Configurable retry policies with exponential backoff
-
BYOK (Bring Your Own Keys): Use existing API credentials through gateway
-
Cost Tracking: Real-time budget management per user/project/API key
-
MCP Gateway: Native integration with Model Context Protocol servers
OpenRouter Unique Capabilities¶
- Community Rankings: Leaderboard visibility for apps
- Established Ecosystem: Wider third-party integration support
- Credit System: Pre-pay with known balance
Decision¶
Implement a Router Abstraction Layer that supports multiple gateways with per-model routing configuration.
Proposed Architecture¶
Combined Architecture with ADR-020 (Model Selection Layer)¶
This ADR operates at Layer 2 (Gateway Routing) while ADR-020 operates at Layer 0 (Query Triage). They compose as follows:
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER REQUEST │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 0: QUERY TRIAGE & MODEL SELECTION (ADR-020) │
├─────────────────────────────────────────────────────────────────────────────┤
│ • Complexity classification (simple/medium/complex) │
│ • Prompt optimization (per-model adaptation) │
│ • Dynamic wildcard seat selection (Not Diamond) │
│ • Output: ResolvedModelId(s) + optimized prompts │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 1: COUNCIL EXECUTION │
├─────────────────────────────────────────────────────────────────────────────┤
│ • Stage 1: Parallel model queries │
│ • Stage 2: Anonymized peer review │
│ • Stage 3: Chairman synthesis │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 2: GATEWAY ROUTING (THIS ADR - ADR-023) │
├─────────────────────────────────────────────────────────────────────────────┤
│ Gateway Abstraction Layer │
│ ├── Input: ResolvedModelId + CanonicalMessage[] │
│ ├── Gateway selection based on model → gateway mapping │
│ ├── BYOK credential injection │
│ └── Fallback chain on infrastructure failure │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ OpenRouter │ │ Requesty │ │ Direct API │
│ Gateway │ │ Gateway │ │ Gateway │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ • Established │ │ • BYOK support │ │ • No middleman │
│ • Wide support │ │ • Smart routing │ │ • Lowest latency│
│ • Credit model │ │ • Cost tracking │ │ • Full control │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LLM PROVIDER APIs │
└─────────────────────────────────────────────────────────────────────────────┘
Execution Order¶
The correct execution order when both ADR-020 and ADR-023 are enabled:
- Triage (ADR-020): Analyze query complexity, decide fast-path vs full council
- Model Selection (ADR-020): Resolve dynamic models (e.g.,
dynamic:not-diamond→deepseek/deepseek-v3) - Gateway Selection (ADR-023): Map resolved model ID to gateway
- API Call (ADR-023): Execute with fallback chain on failure
Router Interface¶
from abc import ABC, abstractmethod
from typing import Dict, List, Optional, Any
from dataclasses import dataclass
@dataclass
class RouterConfig:
name: str
base_url: str
api_key_env: str
timeout: float = 120.0
retry_policy: Optional[Dict] = None
extra_headers: Optional[Dict[str, str]] = None
class BaseRouter(ABC):
"""Abstract base for all AI gateway routers."""
def __init__(self, config: RouterConfig):
self.config = config
@abstractmethod
async def query(
self,
model: str,
messages: List[Dict[str, str]],
**kwargs
) -> Dict[str, Any]:
"""Execute a query through this router."""
pass
@abstractmethod
def normalize_model_id(self, model: str) -> str:
"""Convert model ID to router-specific format."""
pass
@abstractmethod
def supports_model(self, model: str) -> bool:
"""Check if this router supports the given model."""
pass
Built-in Router Implementations¶
# routers/openrouter.py
class OpenRouterGateway(BaseRouter):
DEFAULT_CONFIG = RouterConfig(
name="openrouter",
base_url="https://openrouter.ai/api/v1/chat/completions",
api_key_env="OPENROUTER_API_KEY",
)
# routers/requesty.py
class RequestyGateway(BaseRouter):
DEFAULT_CONFIG = RouterConfig(
name="requesty",
base_url="https://router.requesty.ai/v1/chat/completions",
api_key_env="REQUESTY_API_KEY",
)
# Requesty-specific: BYOK configuration
def with_byok(self, provider_key_env: str) -> "RequestyGateway":
"""Configure BYOK mode for this gateway."""
pass
# routers/direct.py
class DirectAPIGateway(BaseRouter):
"""Direct API access to providers (Anthropic, OpenAI, Google)."""
PROVIDER_CONFIGS = {
"anthropic": RouterConfig(
name="anthropic-direct",
base_url="https://api.anthropic.com/v1/messages",
api_key_env="ANTHROPIC_API_KEY",
),
"openai": RouterConfig(
name="openai-direct",
base_url="https://api.openai.com/v1/chat/completions",
api_key_env="OPENAI_API_KEY",
),
"google": RouterConfig(
name="google-direct",
base_url="https://generativelanguage.googleapis.com/v1/models",
api_key_env="GOOGLE_API_KEY",
),
}
Configuration Schema¶
# config.py additions
@dataclass
class GatewayConfig:
"""Multi-router configuration (ADR-023)."""
# Default router for all models
default_router: str = "openrouter" # openrouter | requesty | direct
# Per-model router overrides
# Format: {"model_prefix": "router_name"}
# Example: {"anthropic/": "requesty", "openai/": "direct"}
model_routing: Dict[str, str] = field(default_factory=dict)
# Fallback chain when primary router fails
fallback_chain: List[str] = field(default_factory=lambda: ["openrouter"])
# BYOK configuration (Requesty-specific)
byok_enabled: bool = False
byok_keys: Dict[str, str] = field(default_factory=dict)
# Example: {"anthropic": "ANTHROPIC_API_KEY", "openai": "OPENAI_API_KEY"}
Environment Variable Configuration¶
# Primary router selection
LLM_COUNCIL_ROUTER=openrouter|requesty|direct # default: openrouter
# API keys per router
OPENROUTER_API_KEY=sk-or-...
REQUESTY_API_KEY=sk-req-...
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
# Per-model routing (JSON format)
LLM_COUNCIL_MODEL_ROUTING='{"anthropic/": "requesty", "openai/": "direct"}'
# Fallback chain
LLM_COUNCIL_FALLBACK_CHAIN=openrouter,requesty,direct
# BYOK mode (Requesty)
LLM_COUNCIL_BYOK=true
LLM_COUNCIL_BYOK_KEYS='{"anthropic": "ANTHROPIC_API_KEY"}'
Use Cases Enabled¶
1. Enterprise Deployment (Direct APIs Only)¶
Organizations that cannot use third-party routers due to compliance:
2. Cost Optimization (Requesty with BYOK)¶
Leverage Requesty analytics while using existing API agreements:
LLM_COUNCIL_ROUTER=requesty
REQUESTY_API_KEY=sk-req-...
LLM_COUNCIL_BYOK=true
LLM_COUNCIL_BYOK_KEYS='{"anthropic": "ANTHROPIC_API_KEY"}'
3. High Availability (Fallback Chain)¶
Router failover for maximum reliability:
4. Hybrid Routing (Per-Model Optimization)¶
Route different models through optimal gateways:
LLM_COUNCIL_ROUTER=openrouter
LLM_COUNCIL_MODEL_ROUTING='{
"anthropic/claude": "requesty",
"openai/": "direct",
"google/": "openrouter"
}'
Migration Strategy¶
Phase 1: Abstraction Layer (Week 1-2)¶
- Create
BaseRouterabstract class - Refactor existing
openrouter.pyto implementOpenRouterGateway - Add
RouterRegistryfor runtime router selection - Maintain 100% backward compatibility
# Backward-compatible usage
from llm_council.openrouter import query_model # Still works
# New usage
from llm_council.routers import get_router
router = get_router("openrouter")
await router.query(model, messages)
Phase 2: Requesty Integration - COMPLETE (v0.12.2, Issue #66)¶
- [x] Implement
RequestyGatewaywith BYOK support - [x] Add fallback chain logic (integrated with GatewayRouter)
- [x] Implement per-model routing
- [x] 20 TDD tests for RequestyGateway
Phase 3: Direct API Support - COMPLETE (v0.12.2, Issue #67)¶
- [x] Implement
DirectGatewayfor Anthropic, OpenAI, Google - [x] Handle provider-specific message formats (OpenAI, Anthropic, Google)
- [x] Add Anthropic Messages API support (differs from OpenAI format)
- [x] 24 TDD tests for DirectGateway
Phase 4: Configuration UI (Future)¶
llm-council config routerCLI command- Interactive router selection and testing
- Key validation per router
Potential Issues and Mitigations¶
Issue 1: Message Format Incompatibility¶
Problem: Anthropic Messages API differs from OpenAI Chat Completions format.
Mitigation: Router-specific message transformers:
class AnthropicTransformer:
def transform_messages(self, messages: List[Dict]) -> Dict:
# Convert OpenAI format to Anthropic format
system = next((m["content"] for m in messages if m["role"] == "system"), None)
user_messages = [m for m in messages if m["role"] != "system"]
return {
"system": system,
"messages": user_messages
}
Issue 2: Inconsistent Model IDs¶
Problem: Model IDs vary across routers (anthropic/claude-3-opus vs claude-3-opus).
Mitigation: Canonical model ID registry with per-router mappings:
MODEL_ID_MAP = {
"claude-opus-4.5": {
"openrouter": "anthropic/claude-opus-4-5-20250514",
"requesty": "anthropic/claude-opus-4.5",
"direct": "claude-3-opus-20240229", # Anthropic native
}
}
Issue 3: Feature Parity Gaps¶
Problem: Some routers support features others don't (e.g., Requesty caching).
Mitigation: Feature capability flags per router:
@dataclass
class RouterCapabilities:
supports_streaming: bool = True
supports_tool_calling: bool = True
supports_caching: bool = False # Requesty-only
supports_byok: bool = False # Requesty-only
supports_fallback_chains: bool = False # Requesty-only
Issue 4: Authentication Complexity¶
Problem: Multiple API keys to manage.
Mitigation: Leverage existing ADR-013 keychain support + clear validation:
async def validate_router_config() -> List[str]:
"""Return list of configuration warnings/errors."""
issues = []
router = get_default_router()
if not router.has_valid_key():
issues.append(f"Missing API key for {router.name}")
for model in COUNCIL_MODELS:
if not get_router_for_model(model).supports_model(model):
issues.append(f"Router {router.name} doesn't support {model}")
return issues
Issue 5: Latency Overhead from Router Switching¶
Problem: Fallback chain adds latency on failures.
Mitigation: Parallel probing + circuit breaker pattern:
class CircuitBreaker:
def __init__(self, failure_threshold: int = 3, reset_timeout: int = 60):
self.failures = 0
self.last_failure = None
self.is_open = False
async def call(self, router: BaseRouter, *args, **kwargs):
if self.is_open and time.time() - self.last_failure < self.reset_timeout:
raise CircuitOpenError(f"{router.name} circuit is open")
try:
result = await router.query(*args, **kwargs)
self.failures = 0
self.is_open = False
return result
except Exception as e:
self.failures += 1
self.last_failure = time.time()
if self.failures >= self.failure_threshold:
self.is_open = True
raise
Alternatives Considered¶
Alternative 1: Requesty-Only Migration¶
Approach: Replace OpenRouter with Requesty as sole gateway.
Rejected because: - Breaks existing user configurations - Loses OpenRouter's established ecosystem - Creates new single point of failure
Alternative 2: Configuration-Only (No Abstraction)¶
Approach: Just add environment variable for API URL.
Rejected because: - Doesn't handle format differences between APIs - No structured fallback support - No per-model routing capability
Alternative 3: Plugin Architecture¶
Approach: External router plugins loaded at runtime.
Rejected because: - Over-engineered for 3-4 known routers - Adds complexity for users - Security implications of loading external code
Open Questions for Council Review¶
- Should BYOK be a first-class feature or Requesty-specific?
- BYOK could apply to direct APIs too (already "BYO" by definition)
-
Standardize BYOK interface across all routers?
-
How should router health be surfaced in MCP health checks?
- Current
council_health_checkonly checks OpenRouter -
Should it check all configured routers or just the default?
-
What's the right default for new installations?
- OpenRouter (established, wide support) vs Requesty (more features)
-
Should default depend on detected API keys?
-
Should fallback chains be automatic or explicit?
- Auto-fallback improves reliability but may surprise users
-
Explicit requires more configuration but is predictable
-
How to handle cost tracking across multiple routers?
- ADR-011 (cost tracking) assumes single provider
- Need unified cost model or per-router tracking?
Integration with ADR-020 (Not Diamond)¶
This section documents how ADR-023 (Gateway Routing) integrates with ADR-020 (Not Diamond Integration Strategy).
Layering Model¶
| Layer | ADR | Responsibility | Output |
|---|---|---|---|
| Layer 0 | ADR-020 | Query triage, model selection | ResolvedModelId + optimized prompt |
| Layer 1 | Core | Council execution (Stage 1-3) | Model responses |
| Layer 2 | ADR-023 | Gateway routing, BYOK, fallback | API call results |
Key Integration Points¶
1. Dynamic Model Resolution¶
When ADR-020's wildcard seat dynamically selects a model (e.g., dynamic:not-diamond → deepseek/deepseek-v3), gateway routing applies to the resolved model ID.
# ADR-020 resolves dynamic model
resolved_model = await not_diamond.select_model(query) # "deepseek/deepseek-v3"
# ADR-023 routes resolved model to gateway
gateway = get_gateway_for_model(resolved_model) # OpenRouter
response = await gateway.complete(resolved_model, messages)
Requirement: If ADR-020 dynamically selects a model not explicitly mapped in ADR-023 config, the default gateway is used.
2. Canonical Format Pipeline¶
The canonical formats are related but serve different purposes:
User Prompt
↓
[ADR-020] Canonical Task Spec (for optimization decisions)
↓
[ADR-020] Optimized Prompt (model-specific)
↓
[ADR-023] CanonicalMessage (gateway-agnostic wire format)
↓
[ADR-023] Gateway-specific format (OpenAI/Anthropic/etc)
ADR-023's CanonicalMessage is the output of any prompt optimization performed by ADR-020.
3. Failure Handling Separation¶
| Failure Type | Handler | Behavior |
|---|---|---|
| Infrastructure failure (timeout, 5xx, rate limit) | ADR-023 Fallback Chain | Retry via next gateway |
| Low-confidence triage | ADR-020 Confidence Gate | Escalate to full council |
| Model returns poor result | Neither (application layer) | Future enhancement |
Gateway fallback is independent of ADR-020's confidence-gated triage:
Triage → Single Model Fast Path
│
▼
Gateway A fails → [ADR-023] Try Gateway B
│
▼
Gateway B fails → [ADR-023] Try Gateway C
│
▼
All gateways exhausted
│
▼
[ADR-023] Raise TransportFailure exception
│
▼
[Application] May escalate to full council
Configuration Interaction¶
Both ADRs have independent configuration, but a unified config schema is recommended:
# Recommended: llm_council_config.yaml
council:
# ADR-020 concerns
triage:
mode: confidence_gated
prompt_optimization: true
wildcard:
enabled: true
selector: not_diamond
api_key: ${NOT_DIAMOND_API_KEY}
# ADR-023 concerns
gateways:
default: openrouter
model_routing:
"anthropic/*": requesty
"deepseek/*": openrouter
fallback_chain: [openrouter, requesty, direct]
byok:
enabled: true
keys:
anthropic: ${ANTHROPIC_API_KEY}
Out of Scope¶
This ADR covers gateway routing (which API endpoint services a model). It does NOT cover:
| Concern | Responsible ADR |
|---|---|
| Query triage / complexity classification | ADR-020 |
| Model selection (which model to use) | ADR-020 |
| Not Diamond integration | ADR-020 |
| Prompt optimization | ADR-020 |
| Tier selection (quick/balanced/high) | ADR-022 |
Not Diamond operates at a higher layer (query triage) and its output (selected models) feeds INTO the gateway routing layer defined here. Not Diamond is NOT a gateway type.
Success Metrics¶
| Metric | Target | Measurement |
|---|---|---|
| Backward compatibility | 100% | Existing configs work unchanged |
| Router switching latency | <50ms | Overhead of router selection |
| Fallback success rate | >95% | Failed requests recovered by fallback |
| Configuration complexity | <5 env vars for common cases | User survey |
| Enterprise adoption | Enable blocked deployments | Customer feedback |
Council Review Summary¶
Status: ACCEPT WITH MODIFICATIONS
Initial Review (2025-12-22)¶
Reviewed by: Gemini 3 Pro (32s), Claude Opus 4.5 (82s), Grok-4 (89s), GPT-5.2-pro (129s)
Council Verdict: All 4 models responded with unanimous agreement that the Router Abstraction Layer is architecturally correct. However, significant modifications are required before implementation.
Harmonization Review (2025-12-22)¶
Reviewed by: Gemini 3 Pro (34s), Claude Opus 4.5 (65s), Grok-4 (78s), GPT-5.2-pro (153s)
Harmonization Verdict: ADR-020 and ADR-023 are architecturally compatible. Required changes are documentation-level, not design-level: - ADR-020 is the "Brain" (Decision Layer) - model selection, triage, prompt optimization - ADR-023 is the "Nervous System" (Transport Layer) - gateway routing, failover, BYOK
Key Consensus: 1. Terminology: ADR-023 uses "gateway routing"; ADR-020 uses "model selection/triage" 2. Not Diamond is a meta-layer, NOT a gateway type 3. Execution order: Triage → Model Selection → Gateway Selection → API Call 4. Canonical formats: Keep separate but coordinated (Task Spec → CanonicalMessage pipeline) 5. Failure handling: Infrastructure failures (ADR-023) vs competence failures (ADR-020) are orthogonal
Consensus Answers to Key Questions¶
1. Should BYOK be a first-class feature or Requesty-specific?¶
Verdict: First-Class, Standardized Feature
The Council unanimously rejects limiting BYOK to Requesty. "Direct API" usage is inherently BYOK.
Recommendation: Implement a standardized CredentialManager interface:
from abc import ABC, abstractmethod
from enum import Enum
from dataclasses import dataclass
class KeyManagementModel(Enum):
PROVIDER_KEYS = "provider" # Router manages keys (OpenRouter default)
USER_KEYS = "user" # User provides their own keys
HYBRID = "hybrid" # Router can use either
@dataclass
class KeyConfiguration:
model: KeyManagementModel
user_keys: dict[str, str] | None = None
model_key_overrides: dict[str, str] | None = None
class BaseRouter(ABC):
@property
@abstractmethod
def supported_key_models(self) -> list[KeyManagementModel]:
"""What key management approaches does this router support?"""
pass
@abstractmethod
def configure_keys(self, config: KeyConfiguration) -> None:
"""Configure key management for this router."""
pass
2. How should router health be surfaced in MCP health checks?¶
Verdict: Tiered Health Model
Checking every router on every heartbeat is rejected due to latency and rate-limit risks.
| Health Tier | Behavior |
|---|---|
| Tier 1 (Fast/Default) | Validate config + connectivity to default router only |
| Tier 2 (Diagnostic) | On-demand health_check(deep=True) that probes all configured gateways |
@dataclass
class RouterHealth:
router_id: str
status: HealthStatus # healthy | degraded | unhealthy
latency_ms: float | None
last_check: datetime
circuit_open: bool = False
consecutive_failures: int = 0
@dataclass
class GatewayHealthReport:
overall_status: HealthStatus
default_router: RouterHealth
fallback_routers: list[RouterHealth]
3. What's the right default for new installations?¶
Verdict: OpenRouter Strict Default + "Auto" Opt-in
To ensure backward compatibility, OpenRouter must remain the default.
Smart Detection (opt-in via LLM_COUNCIL_ROUTER=auto):
1. If REQUESTY_API_KEY present → Requesty
2. Else if OPENROUTER_API_KEY present → OpenRouter
3. Else if direct provider keys present → Direct
4. Else → Fail fast with clear configuration error
Key Principle: Defaults should be predictable. Auto-detection should inform, not decide.
INFO: Detected Anthropic and OpenAI API keys. Consider LLM_COUNCIL_ROUTER=direct for lowest latency.
INFO: Using default router: openrouter
4. Should fallback chains be automatic or explicit?¶
Verdict: Explicit Configuration Only (Safety First)
Automatic fallback is deemed too risky: - Silently falling back from GPT-4 to a different model changes agent behavior - Can cause hallucinations that are hard to debug - May violate enterprise constraints (data residency, vendor allowlists)
Fallback Policy Configuration:
@dataclass
class FallbackPolicy:
behavior: FallbackBehavior = FallbackBehavior.EXPLICIT
fallback_chain: list[str] = field(default_factory=list)
# Trigger conditions
fallback_on_rate_limit: bool = True
fallback_on_timeout: bool = True
fallback_on_server_error: bool = True # 5xx
# Never fallback on user errors (should surface immediately)
fallback_on_auth_error: bool = False # 401/403
fallback_on_invalid_request: bool = False # 400
fallback_on_content_filter: bool = False
5. How to handle cost tracking across multiple routers?¶
Verdict: Unified Data Model with Router-Specific Adapters
@dataclass
class UnifiedCostRecord:
"""Router-agnostic cost representation."""
timestamp: datetime
model: str
router: str
input_tokens: int
output_tokens: int
cost_usd: Decimal
router_reported_cost: Decimal | None = None
pricing_source: str = "calculated" # or "router_reported"
Key insight: Prefer router-reported costs when available, fall back to calculated costs using a maintained pricing table, always record both for reconciliation.
Critical Risks Identified (Question 6)¶
The Council identified Message Format Divergence as the highest underestimated risk.
| Risk | Severity | Mitigation |
|---|---|---|
| Message Format Divergence | HIGH | Define Canonical Internal Message Format with per-router transformers |
| Feature Parity Leaks | HIGH | RouterCapabilities flags; skip incompatible fallbacks |
| Compliance/Data Routing | HIGH | Explicit allowlists, region locks, "no aggregator" mode |
| Streaming Behavior Inconsistency | MEDIUM | Define streaming contract with clear chunk semantics |
| Rate Limit Semantic Differences | MEDIUM | Explicit RateLimitInfo abstraction |
| Testing Matrix Explosion | HIGH | Conformance test suite with recorded fixtures |
| Circuit Breaker in Distributed Deployments | MEDIUM | Local breakers with jitter; optional shared state (Redis) |
| Secret Handling/Leakage | MEDIUM | Strict redaction, structured secret objects |
Canonical Message Format Required:
@dataclass
class CanonicalMessage:
"""Internal representation—routers transform to/from this."""
role: str # system, user, assistant, tool
content: list['ContentBlock']
tool_calls: list['ToolCall'] = field(default_factory=list)
tool_call_id: str | None = None
@dataclass
class ContentBlock:
type: str # text, image, tool_use, tool_result
text: str | None = None
image_url: str | None = None
tool_use: dict[str, Any] | None = None
Interface Redesign (Question 7)¶
Verdict: Interface is too minimal—add capabilities discovery and health probing
class BaseRouter(ABC):
@property
@abstractmethod
def router_id(self) -> str:
"""Unique identifier for this router."""
pass
@property
@abstractmethod
def capabilities(self) -> RouterCapabilities:
"""What this router supports."""
pass
@abstractmethod
async def complete(
self,
messages: list[CanonicalMessage],
model: str,
**kwargs
) -> 'CompletionResponse':
"""Synchronous completion."""
pass
@abstractmethod
async def complete_stream(
self,
messages: list[CanonicalMessage],
model: str,
**kwargs
) -> AsyncIterator['StreamChunk']:
"""Streaming completion."""
pass
async def health_check(self, deep: bool = False) -> RouterHealth:
"""Tiered health check."""
pass
def validate_request(self, request: CanonicalRequest) -> bool:
"""Pre-flight check for context limits and feature support."""
pass
@dataclass
class RouterCapabilities:
supports_streaming: bool = True
supports_tools: bool = True
supports_vision: bool = True
supports_json_mode: bool = True
supports_byok: bool = False
requires_byok: bool = False
provides_cost_in_response: bool = False
max_context_window: int | None = None
Additional Architectural Recommendations¶
1. Configuration File Support¶
Environment variables become unwieldy. Support YAML configuration:
# llm_council_routing.yaml
default_router: openrouter
routers:
openrouter:
api_key: ${OPENROUTER_API_KEY}
requesty:
api_key: ${REQUESTY_API_KEY}
direct:
providers:
anthropic:
api_key: ${ANTHROPIC_API_KEY}
openai:
api_key: ${OPENAI_API_KEY}
fallback:
behavior: explicit
chain: [openrouter, direct]
model_routing:
"anthropic/*": direct
"openai/*": direct
2. Circuit Breaker Defaults¶
@dataclass
class CircuitBreakerConfig:
failure_threshold: int = 5
failure_window_seconds: int = 60
reset_timeout_seconds: int = 30
half_open_max_failures: int = 1
3. MockRouter for Testing¶
@dataclass
class MockRouter(BaseRouter):
"""For testing—supports programmed responses and failure injection."""
responses: dict[str, MockResponse] = field(default_factory=dict)
failure_rate: float = 0.0
calls: list[dict] = field(default_factory=list) # For assertions
Implementation Revision (Council-Informed)¶
| Phase | Original | Council Revision |
|---|---|---|
| Phase 1 | Abstraction Layer | Add Canonical Message Schema + RouterCapabilities |
| Phase 2 | Requesty Integration | Add standardized CredentialManager (not Requesty-specific) |
| Phase 3 | Direct API Support | Include Message Transformers for each provider |
| Phase 4 | Configuration UI | Add YAML config support, validation tools |
Actions Required Before Approval¶
- Define Canonical Message Format Specification
- Add configuration file support alongside env vars
- Expand
BaseRouterinterface with capabilities and health_check - Specify circuit breaker defaults
- Include MockRouter in implementation for testing
- Document migration path explicitly
- Update ADR-011 for multi-router cost tracking
- [Harmonization] Clarify terminology: use "gateway routing" consistently
- [Harmonization] Document integration with ADR-020 (completed in this revision)
- [Harmonization] Define unified config schema for both ADR-020 and ADR-023
Rollback Triggers¶
automatic_rollback:
- backward_compatibility_break_detected
- fallback_error_rate > 10%
- config_validation_failure_rate > 5%
- latency_overhead > 100ms
References¶
Related ADRs (Unified Routing Architecture)¶
- ADR-020: Not Diamond Integration Strategy - Layer 2 (Query Triage)
- ADR-022: Tiered Model Selection - Layer 1 (Tier Selection)
- ADR-024: Unified Routing Architecture - Coordination layer