Run Benchmark

Math Word Problems

This benchmark is in Tier 1 (Screening).

Select Model
Models that cannot run this benchmark tier are shown as disabled based on capability level.
Benchmark execution is allowed only from direct local/private network IPs.
Cancel
Benchmark Info

A benchmark to evaluate a model's ability to read math word problems and extract the relevant numbers to compute the correct answer. Approximately one third of questions contain distractor/unused information.