Introducing Agent Skills: Multi-Model Consensus for AI Code Assistants¶
Published: December 2025
Your AI code assistant just wrote 500 lines of code. Is it correct? Is it secure? Does it follow your team's patterns?
You could review it yourself. Or you could let multiple AI models deliberate and reach consensus.
That's what Agent Skills bring to LLM Council.
The Problem with Single-Model Verification¶
AI code assistants are powerful, but they have a fundamental limitation: they can't reliably verify their own work.
A model that confidently generates buggy code will confidently assert that the code is correct. Self-review doesn't catch the blind spots because the same reasoning that produced the bug will miss it on review.
The traditional solution? Human review. But human review doesn't scale, and humans miss things too.
Multi-Model Consensus: A Better Way¶
What if instead of one model reviewing its own work, you had multiple models:
- Generate independent responses to the same verification question
- Anonymously evaluate each other's assessments
- Synthesize a final verdict from the collective deliberation
This is LLM Council's core pattern. Now we've packaged it into Agent Skills that any AI code assistant can use.
Three Skills, Three Use Cases¶
council-verify: General Work Verification¶
Use when you need to verify that implementation matches requirements. The council evaluates:
- Accuracy (30%): Is the implementation correct?
- Completeness (25%): Are all requirements addressed?
- Clarity (20%): Is the code understandable?
- Conciseness (15%): Is it appropriately sized?
- Relevance (10%): Does it solve the right problem?
council-review: Code Review with Security Focus¶
Specialized for PR reviews with 35% accuracy weight (higher than general verification). Focus areas include:
- Security: SQL injection, XSS, secrets exposure, authentication flaws
- Performance: Algorithm complexity, N+1 queries, memory leaks
- Testing: Coverage gaps, flaky tests, missing edge cases
council-gate: CI/CD Quality Gates¶
# GitHub Actions
- name: Council Quality Gate
run: |
llm-council gate \
--snapshot ${{ github.sha }} \
--rubric-focus Security \
--confidence-threshold 0.8
Returns structured exit codes for pipeline integration:
| Exit Code | Verdict | Action |
|---|---|---|
0 |
PASS | Continue deployment |
1 |
FAIL | Block deployment |
2 |
UNCLEAR | Require human review |
Why This Matters: The Accuracy Ceiling Rule¶
Here's a key insight from ADR-016: well-written incorrect answers are more dangerous than poorly-written correct ones.
A confident, articulate response that's factually wrong can fool humans. It can also fool single-model self-review.
Multi-model consensus catches this. If one model generates a plausible-sounding bug, other models with different training biases will likely catch it during peer review.
We enforce this with the accuracy ceiling rule:
def apply_accuracy_ceiling(accuracy: float, weighted_score: float) -> float:
"""Accuracy caps the maximum possible score."""
if accuracy < 5:
return min(weighted_score, 4.0) # Significant errors
if accuracy < 7:
return min(weighted_score, 7.0) # Mixed accuracy
return weighted_score # No ceiling
A beautifully written, highly-ranked response with an accuracy score of 4 gets capped at 4.0 overall. No amount of eloquence can overcome fundamental incorrectness.
Progressive Disclosure: Token Efficiency¶
Skills use progressive disclosure to minimize context window usage:
| Level | Content | Tokens |
|---|---|---|
| Level 1 | Metadata only | ~100-200 |
| Level 2 | Full SKILL.md | ~500-1000 |
| Level 3 | Resources (rubrics) | Variable |
Your AI assistant loads only what it needs:
from llm_council.skills import SkillLoader
loader = SkillLoader(Path(".github/skills"))
# Level 1: Quick discovery
metadata = loader.load_metadata("council-verify")
print(f"Tokens: {metadata.estimated_tokens}") # ~150
# Level 2: Full instructions (only when needed)
full = loader.load_full("council-verify")
# Level 3: Resources on demand
rubrics = loader.load_resource("council-verify", "rubrics.md")
Cross-Platform Compatibility¶
Skills live in .github/skills/, a location supported by:
- Claude Code
- VS Code Copilot
- Cursor
- Codex CLI
- Other MCP-compatible clients
.github/skills/
├── council-verify/
│ ├── SKILL.md
│ └── references/rubrics.md
├── council-review/
│ ├── SKILL.md
│ └── references/code-review-rubric.md
└── council-gate/
├── SKILL.md
└── references/ci-cd-rubric.md
Install via PyPI and the skills come bundled:
The Audit Trail¶
Every verification produces a complete transcript:
.council/logs/2025-12-31T10-30-00-abc123/
├── request.json # Input snapshot
├── stage1.json # Individual model responses
├── stage2.json # Peer reviews (anonymized)
├── stage3.json # Chairman synthesis
└── result.json # Final verdict
This enables:
- Reproducibility: Re-run the same verification
- Debugging: Understand why a verdict was reached
- Compliance: Audit trail for regulated industries
- Improvement: Train on disagreement patterns
Getting Started¶
1. Install LLM Council¶
2. Configure API Key¶
3. Use Skills in Your AI Assistant¶
Claude Code example:
Or via MCP:
4. Integrate with CI/CD¶
# .github/workflows/council-gate.yml
name: Council Quality Gate
on: [pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: amiable-dev/llm-council-action@v1
with:
snapshot: ${{ github.sha }}
confidence-threshold: 0.8
env:
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
The llm-council-action provides:
- Fast execution via pip caching (~3-5s after first run)
- Rich outputs (verdict, confidence, summary)
- GitHub Step Summary with collapsible details
- PR comments with evaluation results
What's Next?¶
Agent Skills are the foundation for AI-assisted verification workflows. Coming soon:
- Security audit skill: Specialized for vulnerability detection
- Documentation review skill: Verify docs match code
- Test generation skill: Generate tests with council consensus
- Custom skill marketplace: Share skills with the community
Try It Now¶
# Clone and explore
git clone https://github.com/amiable-dev/llm-council.git
cd llm-council
# Check out the skills
ls .github/skills/
# Read the detailed guide
cat docs/guides/skills.md
This post introduces ADR-034: Agent Skills Integration for Work Verification.
LLM Council is open source: github.com/amiable-dev/llm-council