MCP Server

Run evaluations from Claude Desktop, Cursor, or any MCP client.

The PeerLM MCP (Model Context Protocol) server lets you run evaluations directly from Claude Desktop, Cursor, or any MCP-compatible client. Available on Pro and Enterprise plans.

Setup

Add PeerLM to your MCP client configuration. No installation needed — it runs via npx.

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "peerlm": {
      "command": "npx",
      "args": ["-y", "@peerlm/mcp"],
      "env": { "PEERLM_API_KEY": "plm_live_..." }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "peerlm": {
      "command": "npx",
      "args": ["-y", "@peerlm/mcp"],
      "env": { "PEERLM_API_KEY": "plm_live_..." }
    }
  }
}

Available Tools

The MCP server provides 9 tools:

Read Tools

list_suites — List all evaluation suites in your workspace
get_suite — Get full configuration for a specific suite
list_models — Browse available models, optionally filtered by provider or tier
get_usage — Check your plan, credit balance, and usage

Create Tools

create_system_prompt — Create a system prompt (persona) in your library
create_test_prompt — Create a test prompt (task) in your library
create_suite — Create an evaluation suite with models, prompts, and criteria

Run Tools

run_eval — Trigger an evaluation run for a suite
get_results — Check run status and view results (leaderboard, scores, recommendations)

Example Workflow

Ask your AI assistant something like: "Compare Claude Sonnet, GPT-5.2, and Gemini 2.5 for extracting structured data from medical notes."

The assistant will:

Use list_models to find the exact model IDs
Use create_system_prompt to define the persona
Use create_test_prompt to create sample tasks
Use create_suite to wire everything together
Use run_eval to start the evaluation
Use get_results to retrieve the leaderboard

Requirements

API key — generate one from Settings > API Keys with read-write scope
Plan — Pro or Enterprise (Free plan does not include API access)
Node.js — version 20 or later (for npx)

Tip: The MCP client (Claude, Cursor) acts as the AI — it generates system prompts, test prompts, and criteria from your description, then creates them via the API. No server-side AI generation needed.