Prompt Writing Guide
Tips and examples for effective system and test prompts.
The quality of your evaluations depends heavily on your prompts. Here are practical tips and examples.
System Prompts
Be Specific About the Role
Instead of "You are a helpful assistant", describe the exact behavior you want:
You are a senior customer support agent for a SaaS product.
Respond professionally but warmly. If you don't know
the answer, say so rather than guessing. Keep responses
under 150 words.Include Constraints
Add constraints that are relevant to your real use case — response length, tone, formatting requirements, or forbidden behaviors.
Test Multiple Personas
Create 2-3 system prompts that represent different approaches to the same task. This reveals which instructions produce the best results across models.
Test Prompts
Cover Edge Cases
Don't just test the happy path. Include prompts that test:
- Ambiguous requests
- Multi-part questions
- Requests the model should refuse
- Domain-specific terminology
- Long context that tests comprehension
Be Representative
Use prompts that mirror real user inputs. If you have production logs, sample actual queries (anonymized as needed).
Vary Difficulty
Mix easy and hard prompts. Easy prompts show baseline competence; hard prompts differentiate top models from the rest.
Criteria Descriptions
The criterion description is sent to evaluator models. Be explicit:
BAD: "Is the response accurate?"
GOOD: "Does the response contain only verifiable facts?
Deduct points for hallucinated information,
unsupported claims, or outdated data.
Full marks for responses that cite sources
or explicitly state uncertainty."