Pass/Fail Thresholds
Set minimum scores and latency limits for automated gating.
Thresholds let you define minimum quality standards. When configured, each run produces a Pass or Fail verdict.
Threshold Types
- Minimum overall score — the lowest acceptable weighted score (e.g., 7.5 out of 10)
- Per-criteria minimums — minimum score for specific criteria (e.g., Accuracy must be at least 8.0)
- Maximum latency — the slowest acceptable average response time in milliseconds
How Verdicts Work
A run passes only if all threshold conditions are met by at least one model. If no model meets all conditions, the run fails.
Threshold violations are stored with the run results and shown in the summary. You can see exactly which thresholds were violated and by how much.
Use Cases
- CI/CD gating — combine with Auto-Run to automatically flag regressions when a model update drops below your threshold
- Vendor qualification — set minimums that models must meet before being approved for production use