Creating Custom Skills¶

This guide explains how to create custom agent skills for LLM Council verification workflows.

Skill Structure¶

Each skill is a directory containing:

.github/skills/your-skill/
├── SKILL.md              # Required: Instructions and metadata
└── references/           # Optional: Additional resources
    ├── rubric.md         # Scoring guidelines
    └── examples.md       # Example usage

SKILL.md Format¶

The SKILL.md file uses YAML frontmatter followed by markdown content:

---
name: your-skill
description: |
  Brief description of what this skill does.
  Include keywords for discovery.
  Keywords: keyword1, keyword2, keyword3

license: MIT
compatibility: "llm-council >= 2.0"
metadata:
  category: your-category
  domain: your-domain
  author: your-name
  repository: https://github.com/your/repo

allowed-tools: "Read Grep Glob mcp:llm-council/verify"
---

# Your Skill Name

Main content and instructions here.

## When to Use

- Use case 1
- Use case 2

## Workflow

1. Step one
2. Step two
3. Step three

## Progressive Disclosure

- **Level 1**: This metadata (~X tokens)
- **Level 2**: Full instructions above (~Y tokens)
- **Level 3**: See `references/rubric.md` for detailed scoring

Required Fields¶

Field	Description
`name`	Skill identifier (lowercase, hyphens)
`description`	Multi-line description with keywords

Optional Fields¶

Field	Description
`license`	License identifier (MIT, Apache-2.0, etc.)
`compatibility`	Version requirements
`metadata.category`	Skill category for filtering
`metadata.domain`	Domain expertise area
`metadata.author`	Author name or organization
`metadata.repository`	Source code URL
`allowed-tools`	Space-separated tool permissions

Categories and Domains¶

Standard Categories: - verification - General verification tasks - code-review - Code review and PR feedback - ci-cd - CI/CD pipeline integration - documentation - Documentation review - testing - Test generation and validation

Standard Domains: - software-engineering - General development - devops - Operations and deployment - security - Security assessment - quality - Quality assurance

Creating Rubrics¶

Rubrics define scoring criteria for your skill. Create references/rubric.md:

# Your Skill Rubrics

## Core Dimensions

### Accuracy (Weight: 30%)

| Score | Anchor | Description |
|-------|--------|-------------|
| 9-10 | **Excellent** | Perfect accuracy |
| 7-8 | **Good** | Minor issues |
| 5-6 | **Mixed** | Some errors |
| 3-4 | **Poor** | Significant errors |
| 1-2 | **Critical** | Fundamental errors |

### Completeness (Weight: 25%)

[Similar table...]

## Domain-Specific Focus

### Your Focus Area

When `rubric_focus: YourFocus` is specified:

**Additional Checks:**
- Check 1
- Check 2

**Red Flags (automatic FAIL):**
- Red flag 1
- Red flag 2

## Verdict Determination

| Confidence | Verdict | Exit Code |
|------------|---------|-----------|
| ≥ threshold | PASS | 0 |
| < threshold, no blockers | UNCLEAR | 2 |
| Any blockers | FAIL | 1 |

Token Efficiency Guidelines¶

Keep skills token-efficient with progressive disclosure:

Level	Target	Content
Level 1	~100-200 tokens	YAML frontmatter only
Level 2	~500-1000 tokens	Full SKILL.md
Level 3	Variable	Resources on demand

Tips: - Keep descriptions concise - Use bullet points over prose - Put detailed examples in references/ - Use tables for structured data

Using the Skill Loader¶

from pathlib import Path
from llm_council.skills import SkillLoader

# Initialize loader
loader = SkillLoader(Path(".github/skills"))

# List available skills
skills = loader.list_skills()

# Level 1: Load metadata
metadata = loader.load_metadata("your-skill")
print(f"Name: {metadata.name}")
print(f"Category: {metadata.category}")
print(f"Tokens: {metadata.estimated_tokens}")

# Level 2: Load full content
full = loader.load_full("your-skill")
print(f"Body length: {len(full.body)}")

# Level 3: Load resources
resources = loader.list_resources("your-skill")
if "rubric.md" in resources:
    rubric = loader.load_resource("your-skill", "rubric.md")

Testing Your Skill¶

Create integration tests to validate your skill:

import pytest
from pathlib import Path
from llm_council.skills import SkillLoader

SKILLS_DIR = Path(".github/skills")

@pytest.fixture
def loader():
    return SkillLoader(SKILLS_DIR)

class TestYourSkill:
    def test_skill_discoverable(self, loader):
        """Skill should be discoverable."""
        assert "your-skill" in loader.list_skills()

    def test_metadata_loads(self, loader):
        """Metadata should load correctly."""
        metadata = loader.load_metadata("your-skill")
        assert metadata.name == "your-skill"
        assert metadata.category is not None

    def test_metadata_is_compact(self, loader):
        """Metadata should be token-efficient."""
        metadata = loader.load_metadata("your-skill")
        assert metadata.estimated_tokens < 300

    def test_full_content_loads(self, loader):
        """Full content should load."""
        full = loader.load_full("your-skill")
        assert len(full.body) > 0

    def test_resources_available(self, loader):
        """Resources should be listed."""
        resources = loader.list_resources("your-skill")
        assert "rubric.md" in resources

Example: Custom Security Audit Skill¶

---
name: security-audit
description: |
  Security audit using LLM Council for vulnerability detection.
  Keywords: security, audit, vulnerability, OWASP, CVE

license: MIT
compatibility: "llm-council >= 2.0"
metadata:
  category: security
  domain: security
  author: your-team

allowed-tools: "Read Grep Glob mcp:llm-council/verify"
---

# Security Audit Skill

Multi-model security assessment for code and configurations.

## When to Use

- Pre-deployment security review
- Dependency vulnerability scanning
- Configuration security audit

## Workflow

1. **Collect Targets**: Specify files or directories to audit
2. **Run Audit**: Invoke `mcp:llm-council/verify` with security focus
3. **Review Findings**: Process blocking issues and suggestions
4. **Remediate**: Address critical and major issues

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | No security issues |
| 1 | Critical vulnerabilities found |
| 2 | Manual security review needed |

## Progressive Disclosure

- **Level 1**: This metadata (~150 tokens)
- **Level 2**: Full instructions (~600 tokens)
- **Level 3**: See `references/security-rubric.md`

Skill Distribution¶

Skills can be distributed via:

In-Repository: Commit to .github/skills/ for project use
PyPI Package: Bundle in src/your_package/skills/bundled/
Skills Marketplace: Submit to community marketplaces

Best Practices¶

Clear Purpose: Each skill should have one clear purpose
Token Efficiency: Keep Level 1 under 200 tokens
Actionable Output: Provide specific remediation suggestions
Test Coverage: Write integration tests for validation
Documentation: Include examples in references/
Exit Codes: Use standard 0/1/2 for CI/CD compatibility