Scores
Scores are the universal measurement unit in Latitude. Every verdict on your agent’s interactions, whether from an automated evaluation, a human annotation, or your own code, is a score. Everything in Latitude’s reliability system is built on top of scores: issues, evaluation dashboards, annotation workflows, simulations, and analytics.What Is a Score
A score is a verdict attached to a trace. Every score has:| Field | Description |
|---|---|
| Value | A number between 0 and 1 |
| Pass / Fail | Whether the interaction met expectations |
| Feedback | Text explaining the verdict |
| Source | Where the score came from: evaluation, annotation, or custom |
Score Sources
Every score has a source that identifies how it was produced:Evaluation Scores
Produced by automated scripts that Latitude runs on your traces. When a trace matches an evaluation’s trigger configuration, the evaluation executes and writes a score. Evaluation scores are the backbone of continuous monitoring: they run on every matching trace automatically, giving you real-time quality visibility.Annotation Scores
Produced by human reviewers. When someone annotates a trace through an annotation queue or inline from the trace view, their verdict becomes a score. Annotation scores serve as ground truth. They represent what a human actually thinks about the agent’s behavior and anchor evaluation alignment metrics.Custom Scores
Submitted by your own code through the Latitude API. Use custom scores for domain-specific quality signals:- User satisfaction ratings
- Task completion metrics
- Business KPIs (conversion rates, resolution rates)
- Downstream validation (was the agent’s output actually correct?)
How Scores Work
Scores from human annotations start as drafts. A draft score:- Persists immediately so it survives page refreshes
- Is visible in annotation queue review and in-progress editing
- Does not appear in analytics, issue discovery, or alignment metrics
- Can be edited and revised while still in draft state
How Scores Flow Through the System
Scores feed forward into every part of Latitude:- Issue Discovery: When scores fail, Latitude groups similar failures into issues: named, trackable failure patterns your team can investigate and resolve.
- Evaluation Generation: Issues can generate monitoring evaluations that watch for that failure pattern on live traffic, producing more scores.
- Alignment: Annotation scores are compared against evaluation scores for the same traces, producing alignment metrics that tell you how well automated evaluations match human judgment.
- Analytics: Finalized scores feed into time-series dashboards showing quality trends across your project.
Next Steps
- Annotations: How human reviewers create scores
- Evaluations: How automated scripts create scores
- Issues: How failed scores become trackable failure patterns
- Analytics: Visualizing score trends
- Scores API: Submit custom scores programmatically