MCP Server
Run evaluations from Claude Desktop, Cursor, or any MCP client.
The PeerLM MCP (Model Context Protocol) server lets you run evaluations directly from Claude Desktop, Cursor, or any MCP-compatible client. Available on Pro and Enterprise plans.
Setup
Add PeerLM to your MCP client configuration. No installation needed — it runs via npx.
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"peerlm": {
"command": "npx",
"args": ["-y", "@peerlm/mcp"],
"env": { "PEERLM_API_KEY": "plm_live_..." }
}
}
}Cursor
Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"peerlm": {
"command": "npx",
"args": ["-y", "@peerlm/mcp"],
"env": { "PEERLM_API_KEY": "plm_live_..." }
}
}
}Available Tools
The MCP server provides 9 tools:
Read Tools
- list_suites — List all evaluation suites in your workspace
- get_suite — Get full configuration for a specific suite
- list_models — Browse available models, optionally filtered by provider or tier
- get_usage — Check your plan, credit balance, and usage
Create Tools
- create_system_prompt — Create a system prompt (persona) in your library
- create_test_prompt — Create a test prompt (task) in your library
- create_suite — Create an evaluation suite with models, prompts, and criteria
Run Tools
- run_eval — Trigger an evaluation run for a suite
- get_results — Check run status and view results (leaderboard, scores, recommendations)
Example Workflow
Ask your AI assistant something like: "Compare Claude Sonnet, GPT-5.2, and Gemini 2.5 for extracting structured data from medical notes."
The assistant will:
- Use
list_modelsto find the exact model IDs - Use
create_system_promptto define the persona - Use
create_test_promptto create sample tasks - Use
create_suiteto wire everything together - Use
run_evalto start the evaluation - Use
get_resultsto retrieve the leaderboard
Requirements
- API key — generate one from Settings > API Keys with read-write scope
- Plan — Pro or Enterprise (Free plan does not include API access)
- Node.js — version 20 or later (for npx)
Tip: The MCP client (Claude, Cursor) acts as the AI — it generates system prompts, test prompts, and criteria from your description, then creates them via the API. No server-side AI generation needed.