Glossary
Definitions of key terms used throughout PeerLM.
| Term | Definition |
|---|---|
| Generator | A model that produces responses to your prompts. The model being evaluated. |
| Evaluator | A model that judges the quality of generator responses against your criteria. |
| Suite | A saved evaluation configuration: which models, prompts, and criteria to use. |
| Run | A single execution of a suite. Produces a set of results. |
| Criteria | The dimensions on which responses are scored (e.g., Accuracy, Clarity). |
| Weight | How much a criterion or model tier contributes to the overall score or credit cost. |
| Credit Multiplier | The cost multiplier per model call based on tier (Standard=1x, Advanced=1x, Premium=2x, Frontier=3x). |
| Tier | A model's pricing/capability category: Standard, Advanced, Premium, or Frontier. |
| Baseline | A run marked as the reference point for comparing future runs of the same suite. |
| Cache Hit | A response reused from a previous run because the model version and prompt content are identical. Free. |
| Deterministic Mode | A setting that attempts temperature=0 and fixed seed for reproducible outputs. |
| System Prompt | Instructions sent as the system message to define the model's role or behavior. |
| Test Prompt | The user message a model responds to. The unit of evaluation. |
| Dataset | A named collection of test prompts for easy selection in suites. |
| Auto-Run | Automatic re-evaluation triggered by model updates, new models, or a schedule. |
| Overage | Credits consumed beyond your Pro plan's 1,000 monthly allocation, billed at $0.10/credit. Free plan uses PAYG at $0.20/credit. |
| Recompute | Re-aggregate scores from existing data without making any API calls. Always free. |