Run Benchmark

Math Word Problems

This benchmark is in Tier 1 (Screening).

Select Model

Model

Models that cannot run this benchmark tier are shown as disabled based on capability level.

Benchmark execution is allowed only from direct local/private network IPs.

Benchmark Info

A benchmark to evaluate a model's ability to read math word problems and extract the relevant numbers to compute the correct answer. Approximately one third of questions contain distractor/unused information.