HelpTroubleshootingGlossary

Glossary

Definitions of key terms used throughout PeerLM.

TermDefinition
GeneratorA model that produces responses to your prompts. The model being evaluated.
EvaluatorA model that judges the quality of generator responses against your criteria.
SuiteA saved evaluation configuration: which models, prompts, and criteria to use.
RunA single execution of a suite. Produces a set of results.
CriteriaThe dimensions on which responses are scored (e.g., Accuracy, Clarity).
WeightHow much a criterion or model tier contributes to the overall score or credit cost.
Credit MultiplierThe cost multiplier per model call based on tier (Standard=1x, Advanced=1x, Premium=2x, Frontier=3x).
TierA model's pricing/capability category: Standard, Advanced, Premium, or Frontier.
BaselineA run marked as the reference point for comparing future runs of the same suite.
Cache HitA response reused from a previous run because the model version and prompt content are identical. Free.
Deterministic ModeA setting that attempts temperature=0 and fixed seed for reproducible outputs.
System PromptInstructions sent as the system message to define the model's role or behavior.
Test PromptThe user message a model responds to. The unit of evaluation.
DatasetA named collection of test prompts for easy selection in suites.
Auto-RunAutomatic re-evaluation triggered by model updates, new models, or a schedule.
OverageCredits consumed beyond your Pro plan's 1,000 monthly allocation, billed at $0.10/credit. Free plan uses PAYG at $0.20/credit.
RecomputeRe-aggregate scores from existing data without making any API calls. Always free.