Prompt Writing Guide

Tips and examples for effective system and test prompts.

The quality of your evaluations depends heavily on your prompts. Here are practical tips and examples.

System Prompts

Be Specific About the Role

Instead of "You are a helpful assistant", describe the exact behavior you want:

You are a senior customer support agent for a SaaS product.
Respond professionally but warmly. If you don't know
the answer, say so rather than guessing. Keep responses
under 150 words.

Include Constraints

Add constraints that are relevant to your real use case — response length, tone, formatting requirements, or forbidden behaviors.

Test Multiple Personas

Create 2-3 system prompts that represent different approaches to the same task. This reveals which instructions produce the best results across models.

Test Prompts

Cover Edge Cases

Don't just test the happy path. Include prompts that test:

Ambiguous requests
Multi-part questions
Requests the model should refuse
Domain-specific terminology
Long context that tests comprehension

Be Representative

Use prompts that mirror real user inputs. If you have production logs, sample actual queries (anonymized as needed).

Vary Difficulty

Mix easy and hard prompts. Easy prompts show baseline competence; hard prompts differentiate top models from the rest.

Criteria Descriptions

The criterion description is sent to evaluator models. Be explicit:

BAD:  "Is the response accurate?"
GOOD: "Does the response contain only verifiable facts?
       Deduct points for hallucinated information,
       unsupported claims, or outdated data.
       Full marks for responses that cite sources
       or explicitly state uncertainty."