Glossary

Definitions of key terms used throughout PeerLM.

Term	Definition
Generator	A model that produces responses to your prompts. The model being evaluated.
Evaluator	A model that judges the quality of generator responses against your criteria.
Suite	A saved evaluation configuration: which models, prompts, and criteria to use.
Run	A single execution of a suite. Produces a set of results.
Criteria	The dimensions on which responses are scored (e.g., Accuracy, Clarity).
Weight	How much a criterion or model tier contributes to the overall score or credit cost.
Credit Multiplier	The cost multiplier per model call based on tier (Standard=1x, Advanced=1x, Premium=2x, Frontier=3x).
Tier	A model's pricing/capability category: Standard, Advanced, Premium, or Frontier.
Baseline	A run marked as the reference point for comparing future runs of the same suite.
Cache Hit	A response reused from a previous run because the model version and prompt content are identical. Free.
Deterministic Mode	A setting that attempts temperature=0 and fixed seed for reproducible outputs.
System Prompt	Instructions sent as the system message to define the model's role or behavior.
Test Prompt	The user message a model responds to. The unit of evaluation.
Dataset	A named collection of test prompts for easy selection in suites.
Auto-Run	Automatic re-evaluation triggered by model updates, new models, or a schedule.
Overage	Credits consumed beyond your Pro plan's 1,000 monthly allocation, billed at $0.10/credit. Free plan uses PAYG at $0.20/credit.
Recompute	Re-aggregate scores from existing data without making any API calls. Always free.