Introducing Agent Skills: Multi-Model Consensus for AI Code Assistants¶

Published: December 2025

Your AI code assistant just wrote 500 lines of code. Is it correct? Is it secure? Does it follow your team's patterns?

You could review it yourself. Or you could let multiple AI models deliberate and reach consensus.

That's what Agent Skills bring to LLM Council.

The Problem with Single-Model Verification¶

AI code assistants are powerful, but they have a fundamental limitation: they can't reliably verify their own work.

A model that confidently generates buggy code will confidently assert that the code is correct. Self-review doesn't catch the blind spots because the same reasoning that produced the bug will miss it on review.

The traditional solution? Human review. But human review doesn't scale, and humans miss things too.

Multi-Model Consensus: A Better Way¶

What if instead of one model reviewing its own work, you had multiple models:

Generate independent responses to the same verification question
Anonymously evaluate each other's assessments
Synthesize a final verdict from the collective deliberation

This is LLM Council's core pattern. Now we've packaged it into Agent Skills that any AI code assistant can use.

Three Skills, Three Use Cases¶

council-verify: General Work Verification¶

mcp://llm-council/verify --snapshot abc123 --file-paths "src/main.py"

Use when you need to verify that implementation matches requirements. The council evaluates:

Accuracy (30%): Is the implementation correct?
Completeness (25%): Are all requirements addressed?
Clarity (20%): Is the code understandable?
Conciseness (15%): Is it appropriately sized?
Relevance (10%): Does it solve the right problem?

council-review: Code Review with Security Focus¶

council-review --file-paths "src/api.py" --rubric-focus Security

Specialized for PR reviews with 35% accuracy weight (higher than general verification). Focus areas include:

Security: SQL injection, XSS, secrets exposure, authentication flaws
Performance: Algorithm complexity, N+1 queries, memory leaks
Testing: Coverage gaps, flaky tests, missing edge cases

council-gate: CI/CD Quality Gates¶

# GitHub Actions
- name: Council Quality Gate
  run: |
    llm-council gate \
      --snapshot ${{ github.sha }} \
      --rubric-focus Security \
      --confidence-threshold 0.8

Returns structured exit codes for pipeline integration:

Exit Code	Verdict	Action
`0`	PASS	Continue deployment
`1`	FAIL	Block deployment
`2`	UNCLEAR	Require human review

Why This Matters: The Accuracy Ceiling Rule¶

Here's a key insight from ADR-016: well-written incorrect answers are more dangerous than poorly-written correct ones.

A confident, articulate response that's factually wrong can fool humans. It can also fool single-model self-review.

Multi-model consensus catches this. If one model generates a plausible-sounding bug, other models with different training biases will likely catch it during peer review.

We enforce this with the accuracy ceiling rule:

def apply_accuracy_ceiling(accuracy: float, weighted_score: float) -> float:
    """Accuracy caps the maximum possible score."""
    if accuracy < 5:
        return min(weighted_score, 4.0)  # Significant errors
    if accuracy < 7:
        return min(weighted_score, 7.0)  # Mixed accuracy
    return weighted_score  # No ceiling

A beautifully written, highly-ranked response with an accuracy score of 4 gets capped at 4.0 overall. No amount of eloquence can overcome fundamental incorrectness.

Progressive Disclosure: Token Efficiency¶

Skills use progressive disclosure to minimize context window usage:

Level	Content	Tokens
Level 1	Metadata only	~100-200
Level 2	Full SKILL.md	~500-1000
Level 3	Resources (rubrics)	Variable

Your AI assistant loads only what it needs:

from llm_council.skills import SkillLoader

loader = SkillLoader(Path(".github/skills"))

# Level 1: Quick discovery
metadata = loader.load_metadata("council-verify")
print(f"Tokens: {metadata.estimated_tokens}")  # ~150

# Level 2: Full instructions (only when needed)
full = loader.load_full("council-verify")

# Level 3: Resources on demand
rubrics = loader.load_resource("council-verify", "rubrics.md")

Cross-Platform Compatibility¶

Skills live in .github/skills/, a location supported by:

Claude Code
VS Code Copilot
Cursor
Codex CLI
Other MCP-compatible clients

.github/skills/
├── council-verify/
│   ├── SKILL.md
│   └── references/rubrics.md
├── council-review/
│   ├── SKILL.md
│   └── references/code-review-rubric.md
└── council-gate/
    ├── SKILL.md
    └── references/ci-cd-rubric.md

Install via PyPI and the skills come bundled:

pip install llm-council-core
llm-council install-skills --target .github/skills

The Audit Trail¶

Every verification produces a complete transcript:

.council/logs/2025-12-31T10-30-00-abc123/
├── request.json      # Input snapshot
├── stage1.json       # Individual model responses
├── stage2.json       # Peer reviews (anonymized)
├── stage3.json       # Chairman synthesis
└── result.json       # Final verdict

This enables:

Reproducibility: Re-run the same verification
Debugging: Understand why a verdict was reached
Compliance: Audit trail for regulated industries
Improvement: Train on disagreement patterns

Getting Started¶

1. Install LLM Council¶

pip install llm-council-core

2. Configure API Key¶

export OPENROUTER_API_KEY=sk-or-v1-...

3. Use Skills in Your AI Assistant¶

Claude Code example:

/council-verify --snapshot HEAD --file-paths "src/feature.py"

Or via MCP:

mcp://llm-council/verify \
  --snapshot $(git rev-parse HEAD) \
  --file-paths "src/feature.py"

4. Integrate with CI/CD¶

# .github/workflows/council-gate.yml
name: Council Quality Gate
on: [pull_request]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: amiable-dev/llm-council-action@v1
        with:
          snapshot: ${{ github.sha }}
          confidence-threshold: 0.8
        env:
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}

The llm-council-action provides:

Fast execution via pip caching (~3-5s after first run)
Rich outputs (verdict, confidence, summary)
GitHub Step Summary with collapsible details
PR comments with evaluation results

What's Next?¶

Agent Skills are the foundation for AI-assisted verification workflows. Coming soon:

Security audit skill: Specialized for vulnerability detection
Documentation review skill: Verify docs match code
Test generation skill: Generate tests with council consensus
Custom skill marketplace: Share skills with the community

Try It Now¶

# Clone and explore
git clone https://github.com/amiable-dev/llm-council.git
cd llm-council

# Check out the skills
ls .github/skills/

# Read the detailed guide
cat docs/guides/skills.md

This post introduces ADR-034: Agent Skills Integration for Work Verification.

LLM Council is open source: github.com/amiable-dev/llm-council