Evaluation Triggers
Every evaluation has a trigger configuration that determines which traces it evaluates and how. Triggers give you precise control over what an evaluation monitors without modifying the evaluation script itself.How Triggers Work
When a trace completes (after a debounce window with no new spans), Latitude checks it against every active evaluation’s trigger configuration. Trigger checks are evaluated in a specific order:- Filter: Does the trace match the evaluation’s filter criteria?
- Sampling: Does it pass the sample rate check?
- Turn / Debounce: Which turn does the evaluation target, and should execution be debounced?
Trigger Fields
Filter
Select which traces the evaluation monitors using any combination of the shared filters:- Status: Only evaluate traces with errors, or only successful traces
- Models: Only evaluate traces that used specific models
- Providers: Only evaluate traces from specific providers
- Tags: Only evaluate traces with specific tags
- Cost: Only evaluate traces above or below a cost threshold
- Duration: Only evaluate traces above or below a duration threshold
- Custom metadata: Filter on any
metadata.*fields your application sends
Sampling
The percentage of matching traces that the evaluation actually runs against, from 0 to 100. This controls cost and processing time while still giving you statistical coverage.- Setting sampling to
0effectively pauses the evaluation. - New evaluations generated from issues default to
10%sampling.
Turn
Controls which trace or turn the evaluation runs on:every: Run on every completed trace (the default)first: Run only on the first trace/turn in a sessionlast: Run only on the last trace/turn in a session
Debounce
A debounce time in seconds. When set, the evaluation waits for the debounce period after the trace completes before executing. This is useful for batching or rate-limiting evaluation execution.Trigger Examples
Monitor all production traces for jailbreak attempts:- Filter: metadata
environment= “production” - Sampling: 100%
- Turn: every
- Debounce: 0
- Filter: cost > $0.50
- Sampling: 25%
- Turn: every
- Debounce: 0
- Filter: (empty: match all)
- Sampling: 10%
- Turn: last
- Debounce: 0
Triggers and Annotation Queues
Triggers work in concert with annotation queues. A common pattern:- An evaluation monitors traces with a broad trigger
- Failed scores feed into issue discovery
- A linked annotation queue surfaces failing traces for human review
- Human annotations measure alignment with the evaluation
Next Steps
- Alignment: How human annotations calibrate evaluations
- Evaluations Overview: How evaluation scripts work
- Annotation Queues: The human side of the feedback loop