Run Benchmark

Validate Bulk IPA/Phonetic (bebras)

This benchmark is in Tier 3 (Advanced).

Select Model
Models that cannot run this benchmark tier are shown as disabled based on capability level.
Benchmark execution is allowed only from direct local/private network IPs.
Cancel
Benchmark Info

A regression benchmark for Bebras bulk pronunciation verification. Tests whether the model returns only words with wrong IPA/phonetic mappings from 20-word lists with English + Chinese disambiguation.