Run Benchmark

Validate Bulk IPA/Phonetic (bebras)

This benchmark is in Tier 3 (Advanced).

Select Model

Model

Models that cannot run this benchmark tier are shown as disabled based on capability level.

Benchmark execution is allowed only from direct local/private network IPs.

Benchmark Info

A regression benchmark for Bebras bulk pronunciation verification. Tests whether the model returns only words with wrong IPA/phonetic mappings from 20-word lists with English + Chinese disambiguation.