Skip to content

Scoring

A Scorer maps a recorded TrialRecord (plus the scene's Target) to a Score. Scorers read the recorded trajectory — never a live environment — so scoring is reproducible from a saved log.

Builtin scorers

from robolens.scorer import (
    success_at_end,        # 1.0 iff the episode terminated with reason "success"
    episode_length,        # number of steps taken
    min_distance_to_goal,  # closest the effector got (reads StepResult.info["distance"])
    reached_goal_state,    # success iff min distance <= threshold
    operator_scorer,       # reads a human verdict recorded during the rollout
)

Custom scorers

A scorer is any object with a name and a __call__(record, target) -> Score:

from dataclasses import dataclass
from robolens.scorer import Score

@dataclass(frozen=True)
class SmoothMotion:
    name: str = "smooth_motion"

    def __call__(self, record, target) -> Score:
        deltas = [abs(float(s.action.data).sum()) for s in record.steps]
        return Score(value=-sum(deltas), explanation="negative total command magnitude")

Register it with scorer to resolve it by name.

Epochs and reducers

When a Task runs epochs > 1, an epoch reducer collapses the per-epoch scores of a scene before metrics aggregate across scenes. Reducers are namespaced separately from metrics and are selected by name on Epochs:

Reducer Meaning
mean, median, max, min numeric reductions (raise on non-numeric strings)
mode most common value (works for categorical scores)
pass_at_<k> unbiased pass@k estimator (success = value ≥ 0.5)
from robolens.task import Epochs, Task
Task(..., epochs=Epochs(count=5, reducer="pass_at_2"))

Operator and VLM scoring (real world)

Real robots have no privileged success oracle. The dominant method is a human verdict, captured once during the rollout (as a transcript event) and read back by operator_scorer — keeping scoring reproducible. A VLMScorer interface is reserved for scoring final frames with a vision-language classifier.