Run Benchmark

Validate Translation (voras)

This benchmark is in Tier 3 (Advanced).

Select Model
Models that cannot run this benchmark tier are shown as disabled based on capability level.
Benchmark execution is allowed only from direct local/private network IPs.
Cancel
Benchmark Info

A regression benchmark for the voras agent's validate_all_translations_for_word() function. Tests whether the LLM correctly identifies semantically incorrect or non-lemma translations across multiple target languages.