git log --oneline --stat
HEAD
- Stars
- 2.9k
- Forks
- 98
- Updated
- Apr 27, 2026
repo --stat
stars
2.9k
forks
98
last update
Apr 27, 2026
license
MITv2.0.5
quickstart.sh
3 steps
- Install
// Drops SKILL.md into ~/.claude/skills/
$ claude skills add evals-runner - Invoke
// Run from any project directory
$ claude --skill evals-runner "help me ship this" - Iterate
// Re-run with edits — Claude keeps the skill loaded
$ claude --skill evals-runner "now refactor it"
evals-runner/
references
- references/
- SKILL.mdopen
- README.mdopen
SKILL.md
readonly
- name:
- Evals Runner
- slug:
- evals-runner
- version:
- v2.0.5
- license:
- MIT
- author:
- @evalforge
- repository:
- github.com/evalforge/evals-runner
- categories:
- tags:
- #evals#llm#benchmarks#judges#regression
- description:
Build, run, and report on LLM evals. Pairwise comparisons, judges, regression detection.
features.md
3 capabilities
// What you can do with it
- Automates the tedious parts of the workflow.
- Gives Claude the right context, tools, and guardrails.
- Produces consistent, reviewable output every time.
README.md
evals-runner/README.md
5 sections
Loading README…
@evalforge/index.json
more by author
$ cat reviews/
Reviews
// No reviews yet. Be the first.
Loading review form…