Setting Up Your Library

Create system prompts, test prompts, and datasets.

The Library holds the prompts used in your evaluations. There are three types of library items.

System Prompts

System prompts define the persona or instructions given to a model before it sees the user's message. Think of them as the "role" the model plays during evaluation.

Name — a short label (e.g., "Customer Support Agent")
Description — optional context for your team
System Prompt — the full text sent as the system message
Tags — for filtering and organization

Test Prompts

Test prompts are the actual user messages models respond to. Each test prompt is an atomic evaluation unit — models are scored on each one independently.

Title — a descriptive label
Prompt — the message sent to the model
Dataset — optional grouping (see below)
Tags — for filtering

Datasets

Datasets are lightweight collections that group test prompts. When building a suite, you can select an entire dataset instead of picking individual prompts. A test prompt can belong to at most one dataset.

Bulk Import

You can import up to 500 items at once for system prompts and test prompts. Use the import button on the respective library page.

Tip: Editing a prompt changes its content hash, which automatically invalidates cached responses. Future runs will re-generate responses for changed prompts while reusing cache for unchanged ones.