Word Length
0011_word_length
T1
A benchmark to evaluate a model's ability to count
the to...
|
100
714.0ms median
730µ$
|
100
1223.5ms median
80µ$
|
42
915.5ms median
16µ$
|
98
766.5ms median
50µ$
cost warning
|
100
654.5ms median
37µ$
|
18
640.0ms median
82µ$
1 latency outlier
|
60
725.0ms median
120µ$
1 latency outlier
|
20
292.0ms median
63µ$
1 latency outlier
|
85
1277.5ms median
169µ$
|
52
1378.0ms median
73µ$
|
90
623.0ms median
32µ$
|
22
779.0ms median
96µ$
1 latency outlier
|
28
592.0ms median
29µ$
|
95
565.0ms median
132µ$
1 latency outlier
|
75
623.0ms median
138µ$
1 latency outlier
|
28
188.0ms median
33µ$
1 latency outlier
|
45
564.0ms median
122µ$
1 latency outlier
|
82
521.0ms median
114µ$
1 latency outlier
|
15
332.0ms median
60µ$
1 latency outlier
|
100
938.0ms median
54µ$
|
72
219.0ms median
63µ$
1 latency outlier
|
57
423.5ms median
93µ$
|
48
2558.0ms median
213µ$
1 latency outlier
|
88
446.5ms median
21µ$
|
72
911.0ms median
46µ$
|
92
1207.5ms median
62µ$
|
35
198.5ms median
26µ$
|
Letter Count
0012_letter_count
T1
A benchmark to evaluate a model's ability to count
how ma...
|
78
691.0ms median
717µ$
|
48
1130.0ms median
77µ$
|
42
946.0ms median
15µ$
|
80
690.5ms median
43µ$
cost warning
|
28
714.5ms median
35µ$
|
12
642.5ms median
35µ$
|
30
599.5ms median
31µ$
|
20
288.0ms median
14µ$
|
52
1327.5ms median
70µ$
|
72
1403.0ms median
77µ$
|
35
649.0ms median
33µ$
|
52
652.5ms median
34µ$
|
15
540.5ms median
27µ$
|
15
568.0ms median
27µ$
|
35
605.5ms median
28µ$
|
42
192.0ms median
10µ$
|
32
545.0ms median
26µ$
|
45
442.0ms median
24µ$
|
12
350.0ms median
18µ$
|
48
1119.5ms median
53µ$
|
22
178.5ms median
9µ$
|
48
379.0ms median
19µ$
|
55
2536.0ms median
130µ$
|
20
445.0ms median
22µ$
|
38
896.0ms median
45µ$
|
50
1346.0ms median
65µ$
|
20
185.5ms median
10µ$
|
Vowel Count
0013_vowel_count
T1
Tests ability to count vowels (a, e, i, o, u and accented forms) in a word acros...
|
80
572.0ms median
791µ$
|
88
1134.0ms median
97µ$
|
55
948.5ms median
19µ$
|
98
818.5ms median
44µ$
cost warning
|
50
691.5ms median
50µ$
|
40
694.0ms median
38µ$
|
38
969.5ms median
49µ$
|
28
363.5ms median
18µ$
|
42
1501.0ms median
80µ$
|
12
1664.0ms median
94µ$
|
52
869.0ms median
44µ$
|
32
796.0ms median
39µ$
|
28
723.5ms median
35µ$
|
38
695.0ms median
36µ$
|
42
731.5ms median
38µ$
|
28
238.0ms median
12µ$
|
48
721.0ms median
37µ$
|
68
635.5ms median
32µ$
|
0
430.5ms median
23µ$
|
85
1246.5ms median
63µ$
|
45
245.5ms median
12µ$
|
40
473.0ms median
24µ$
|
68
2987.0ms median
153µ$
|
15
581.0ms median
29µ$
|
25
1193.5ms median
60µ$
|
28
1972.0ms median
101µ$
|
8
244.0ms median
12µ$
|
Syllable Count
0014_syllable_count
T1
Tests ability to count syllables in words across Latin-alphabet languages....
|
92
591.5ms median
738µ$
|
90
1364.0ms median
85µ$
|
62
1023.0ms median
17µ$
|
95
717.5ms median
47µ$
cost warning
|
60
667.5ms median
40µ$
|
30
715.5ms median
38µ$
|
70
1051.0ms median
52µ$
|
22
360.0ms median
18µ$
|
48
1623.0ms median
85µ$
|
28
1860.5ms median
107µ$
|
82
843.5ms median
41µ$
|
38
863.0ms median
45µ$
|
30
747.5ms median
37µ$
|
75
711.5ms median
36µ$
|
88
729.0ms median
36µ$
|
42
238.5ms median
12µ$
|
72
710.5ms median
35µ$
|
62
666.0ms median
33µ$
|
5
433.5ms median
22µ$
|
95
1361.0ms median
71µ$
|
25
256.0ms median
13µ$
|
62
497.5ms median
25µ$
|
28
2995.0ms median
152µ$
|
12
522.0ms median
27µ$
|
52
1042.0ms median
52µ$
|
60
1679.0ms median
81µ$
|
15
244.5ms median
12µ$
|
Spell Check
0015_spell_check
T1
A benchmark to evaluate a model's ability to identify
mis...
|
100
695.0ms median
796µ$
|
100
1227.5ms median
89µ$
|
95
984.5ms median
18µ$
|
100
817.5ms median
45µ$
cost warning
|
100
699.0ms median
42µ$
|
90
929.5ms median
50µ$
|
100
1430.5ms median
70µ$
|
55
507.0ms median
28µ$
|
100
2047.5ms median
106µ$
|
88
1782.0ms median
95µ$
|
82
810.5ms median
41µ$
|
95
1175.0ms median
59µ$
|
38
1096.0ms median
54µ$
|
98
947.5ms median
49µ$
|
92
960.5ms median
49µ$
|
40
337.0ms median
17µ$
|
90
1013.0ms median
51µ$
|
85
810.0ms median
43µ$
|
38
589.0ms median
29µ$
|
95
1856.0ms median
96µ$
|
75
290.5ms median
15µ$
|
88
620.0ms median
32µ$
|
100
3221.5ms median
166µ$
|
82
759.0ms median
39µ$
|
90
1395.0ms median
71µ$
|
95
2139.5ms median
107µ$
|
25
326.5ms median
16µ$
|
Antonym Identification
0016_antonym
T1
A benchmark to evaluate a model's ability to identify
the...
|
100
570.5ms median
723µ$
|
100
1141.0ms median
80µ$
|
100
984.5ms median
16µ$
|
100
768.5ms median
46µ$
cost warning
|
100
673.5ms median
37µ$
|
100
737.5ms median
37µ$
|
100
928.0ms median
47µ$
|
90
393.0ms median
20µ$
|
100
1702.5ms median
86µ$
|
95
1633.5ms median
90µ$
|
98
763.5ms median
38µ$
|
100
713.0ms median
37µ$
|
38
746.0ms median
39µ$
|
100
672.0ms median
33µ$
|
100
693.0ms median
35µ$
|
62
251.5ms median
13µ$
|
100
657.0ms median
33µ$
|
100
631.0ms median
32µ$
|
8
471.0ms median
25µ$
|
100
1248.5ms median
63µ$
|
92
217.0ms median
11µ$
|
98
437.0ms median
22µ$
|
100
2860.5ms median
146µ$
|
100
546.0ms median
27µ$
|
100
1053.5ms median
52µ$
|
100
1534.5ms median
77µ$
|
30
230.5ms median
12µ$
|
Multilingual Synonym Generation
0017_synonyms
T1
A benchmark to evaluate a model's ability to generate noun synonyms
...
|
98
595.5ms median
739µ$
|
100
1124.5ms median
83µ$
|
96
853.5ms median
17µ$
|
100
724.5ms median
43µ$
cost warning
|
100
680.0ms median
39µ$
|
77
790.0ms median
42µ$
|
96
1050.0ms median
53µ$
|
31
390.0ms median
20µ$
|
98
1764.0ms median
92µ$
|
67
1783.0ms median
96µ$
|
94
776.0ms median
39µ$
|
90
1023.5ms median
55µ$
|
17
861.0ms median
49µ$
|
79
787.0ms median
44µ$
|
81
803.5ms median
44µ$
|
21
258.5ms median
13µ$
|
69
749.5ms median
38µ$
|
77
812.5ms median
42µ$
|
2
564.0ms median
31µ$
|
94
1707.5ms median
91µ$
|
88
247.5ms median
13µ$
|
96
513.5ms median
27µ$
|
100
3064.0ms median
158µ$
|
88
570.5ms median
28µ$
|
92
1078.5ms median
58µ$
|
94
1620.0ms median
85µ$
|
15
285.5ms median
15µ$
|
Pinyin Letter Count
0018_pinyin_letters
T1
A benchmark to evaluate a model's ability to count
how many times a s...
|
35
671.5ms median
879µ$
|
35
1135.0ms median
119µ$
|
15
1066.0ms median
24µ$
|
40
740.0ms median
37µ$
cost warning
|
20
682.5ms median
68µ$
|
30
692.5ms median
41µ$
|
30
810.0ms median
45µ$
|
60
349.0ms median
18µ$
|
50
1626.0ms median
92µ$
|
35
1446.0ms median
97µ$
|
15
1059.0ms median
55µ$
|
25
772.5ms median
39µ$
|
15
678.5ms median
37µ$
|
25
533.0ms median
30µ$
|
15
526.0ms median
29µ$
|
10
203.0ms median
11µ$
|
15
567.0ms median
31µ$
|
20
466.5ms median
27µ$
|
0
766.0ms median
41µ$
|
35
1092.0ms median
62µ$
|
5
220.5ms median
12µ$
|
5
414.0ms median
22µ$
|
20
3396.5ms median
174µ$
|
5
547.0ms median
30µ$
|
0
1344.0ms median
76µ$
|
25
2369.5ms median
117µ$
|
35
220.5ms median
12µ$
|
Simple Arithmetic
0021_simple_arithmetic
T1
A benchmark to evaluate a model's ability to perform basic arithmeti...
|
100
602.0ms median
703µ$
|
100
1066.5ms median
73µ$
|
100
881.5ms median
15µ$
|
100
719.5ms median
38µ$
cost warning
|
100
691.5ms median
32µ$
|
98
649.0ms median
35µ$
|
100
663.5ms median
34µ$
|
98
310.5ms median
16µ$
|
100
1489.0ms median
78µ$
|
95
1511.0ms median
79µ$
|
100
620.5ms median
31µ$
|
100
706.0ms median
32µ$
|
57
493.0ms median
26µ$
|
100
563.0ms median
28µ$
|
90
607.0ms median
30µ$
|
100
187.0ms median
9µ$
|
100
635.5ms median
32µ$
|
100
433.0ms median
23µ$
|
10
355.0ms median
18µ$
|
100
1112.0ms median
51µ$
|
100
202.0ms median
10µ$
|
100
410.5ms median
21µ$
|
100
2982.5ms median
157µ$
|
100
444.0ms median
23µ$
|
100
838.5ms median
42µ$
|
98
1231.5ms median
59µ$
|
95
196.0ms median
10µ$
|
Unit Conversion
0022_unit_conversion
T1
A benchmark to evaluate a model's ability to accurately convert
...
|
100
613.0ms median
712µ$
|
100
1172.0ms median
80µ$
|
72
901.5ms median
16µ$
|
100
729.0ms median
40µ$
cost warning
|
100
689.5ms median
35µ$
|
25
717.5ms median
39µ$
|
98
1108.0ms median
56µ$
|
12
346.0ms median
17µ$
|
95
1856.0ms median
97µ$
|
25
2211.0ms median
119µ$
|
92
732.0ms median
39µ$
|
88
878.5ms median
50µ$
|
18
881.5ms median
45µ$
|
80
634.5ms median
32µ$
|
85
633.5ms median
32µ$
|
22
227.5ms median
12µ$
|
85
834.0ms median
43µ$
|
98
645.0ms median
32µ$
|
5
427.5ms median
23µ$
|
98
1219.0ms median
64µ$
|
68
295.0ms median
16µ$
|
85
617.5ms median
33µ$
|
95
3234.5ms median
161µ$
|
75
543.5ms median
31µ$
|
90
1235.0ms median
64µ$
|
92
1915.0ms median
95µ$
|
50
219.5ms median
13µ$
|
Math Word Problems
0023_word_problems
T1
A benchmark to evaluate a model's ability to read math word problems...
|
100
537.5ms median
728µ$
|
98
1082.0ms median
78µ$
|
95
933.0ms median
16µ$
|
100
692.0ms median
44µ$
cost warning
|
100
711.0ms median
37µ$
|
75
672.0ms median
34µ$
|
100
900.0ms median
45µ$
|
68
338.0ms median
17µ$
|
100
1466.5ms median
75µ$
|
45
1734.5ms median
97µ$
|
98
653.0ms median
34µ$
|
90
766.5ms median
39µ$
|
68
653.5ms median
33µ$
|
100
570.0ms median
31µ$
|
70
606.5ms median
30µ$
|
75
188.0ms median
10µ$
|
100
635.5ms median
33µ$
|
100
447.5ms median
24µ$
|
20
396.0ms median
21µ$
|
100
1086.5ms median
55µ$
|
92
217.5ms median
11µ$
|
92
422.0ms median
22µ$
|
100
2934.5ms median
152µ$
|
100
485.0ms median
25µ$
|
95
962.0ms median
49µ$
|
72
1678.0ms median
84µ$
|
62
202.0ms median
11µ$
|
Fractions and Percentages
0024_percentage_math
T1
A benchmark to evaluate a model's ability to calculate percentages a...
|
100
611.5ms median
708µ$
|
100
1025.0ms median
74µ$
|
100
897.5ms median
15µ$
|
100
710.0ms median
121µ$
|
100
682.0ms median
33µ$
|
90
672.0ms median
34µ$
|
98
718.5ms median
37µ$
|
52
296.5ms median
15µ$
|
100
1444.5ms median
72µ$
|
72
1621.0ms median
97µ$
|
98
661.0ms median
32µ$
|
100
723.5ms median
37µ$
|
48
614.5ms median
32µ$
|
90
571.5ms median
28µ$
|
100
615.0ms median
31µ$
|
50
187.0ms median
10µ$
|
95
623.0ms median
31µ$
|
98
457.5ms median
24µ$
|
5
378.0ms median
20µ$
|
100
915.5ms median
51µ$
|
95
204.0ms median
10µ$
|
100
423.5ms median
21µ$
|
100
2957.0ms median
150µ$
|
100
485.0ms median
25µ$
|
100
938.5ms median
46µ$
|
100
1437.0ms median
71µ$
|
68
210.0ms median
11µ$
|
Algebra
0025_algebra
T1
A benchmark to evaluate a model's ability to solve linear and quadra...
|
100
613.0ms median
721µ$
|
98
1029.0ms median
79µ$
|
80
918.0ms median
16µ$
|
100
721.0ms median
134µ$
|
100
701.0ms median
36µ$
|
42
649.5ms median
37µ$
|
72
976.5ms median
50µ$
|
30
322.0ms median
17µ$
|
85
1374.5ms median
77µ$
|
12
2798.0ms median
172µ$
1 latency outlier
|
75
707.0ms median
35µ$
|
45
715.0ms median
40µ$
|
10
620.5ms median
36µ$
|
48
588.5ms median
35µ$
|
55
739.5ms median
37µ$
|
42
215.0ms median
11µ$
|
48
612.0ms median
31µ$
|
62
594.5ms median
31µ$
|
2
425.0ms median
22µ$
|
90
918.5ms median
56µ$
|
60
204.0ms median
11µ$
|
65
401.0ms median
21µ$
|
100
2889.5ms median
151µ$
|
50
472.0ms median
27µ$
|
80
925.0ms median
49µ$
|
52
1503.5ms median
76µ$
|
35
197.0ms median
12µ$
|
Time Arithmetic
0026_time_arithmetic
T1
A benchmark to evaluate a model's ability to add and subtract
...
|
100
612.0ms median
730µ$
|
100
1129.5ms median
82µ$
|
82
1021.0ms median
16µ$
|
100
712.5ms median
140µ$
|
100
702.0ms median
38µ$
|
35
830.5ms median
44µ$
|
70
1012.5ms median
51µ$
|
8
433.5ms median
22µ$
|
70
1704.0ms median
89µ$
|
70
1986.5ms median
101µ$
|
78
801.0ms median
38µ$
|
60
917.5ms median
47µ$
|
18
863.0ms median
43µ$
|
65
580.5ms median
31µ$
|
55
706.0ms median
36µ$
|
2
257.5ms median
16µ$
|
55
695.0ms median
35µ$
|
65
518.0ms median
27µ$
|
0
452.0ms median
23µ$
|
90
1339.0ms median
64µ$
|
28
249.5ms median
13µ$
|
65
487.0ms median
25µ$
|
75
3312.5ms median
169µ$
|
40
528.0ms median
28µ$
|
65
1153.5ms median
59µ$
|
85
1493.0ms median
77µ$
|
5
274.5ms median
13µ$
|
Geometry
0027_geometry
T1
A benchmark to evaluate a model's ability to calculate area, perimet...
|
100
612.5ms median
714µ$
|
100
1072.5ms median
76µ$
|
98
919.5ms median
15µ$
|
100
715.0ms median
126µ$
|
100
702.5ms median
34µ$
|
68
666.0ms median
34µ$
|
95
767.0ms median
42µ$
|
30
314.5ms median
17µ$
|
95
1464.5ms median
77µ$
|
38
2005.5ms median
120µ$
|
95
685.0ms median
36µ$
|
72
731.5ms median
39µ$
|
20
608.0ms median
32µ$
|
82
571.5ms median
30µ$
|
60
621.5ms median
33µ$
|
40
197.5ms median
10µ$
|
82
654.5ms median
34µ$
|
95
467.0ms median
26µ$
|
5
376.5ms median
19µ$
|
100
1121.5ms median
55µ$
|
80
210.0ms median
11µ$
|
98
428.0ms median
23µ$
|
100
3050.5ms median
156µ$
|
60
496.0ms median
26µ$
|
88
976.5ms median
51µ$
|
100
1441.0ms median
74µ$
|
45
210.5ms median
11µ$
|
Definitions
0031_definitions
T2
A benchmark to evaluate a model's ability to identify
the...
|
100
511.0ms median
103µ$
|
100
1026.5ms median
63µ$
|
100
920.0ms median
13µ$
|
100
689.0ms median
96µ$
|
100
663.5ms median
26µ$
|
98
299.0ms median
16µ$
|
100
560.0ms median
29µ$
|
2
679.5ms median
33µ$
|
-
|
98
19474.0ms median
984µ$
|
100
419.5ms median
21µ$
|
100
457.0ms median
25µ$
|
10
646.0ms median
114µ$
1 latency outlier
|
98
396.5ms median
20µ$
|
98
398.5ms median
21µ$
|
60
86.0ms median
5µ$
|
100
395.0ms median
20µ$
|
100
372.5ms median
106µ$
|
0
599.5ms median
234µ$
2 latency outliers
|
100
850.0ms median
43µ$
|
-
|
-
|
-
|
100
229.5ms median
11µ$
|
100
19892.5ms median
989µ$
|
-
|
65
114.0ms median
6µ$
|
Part of Speech
0032_part_of_speech
T2
A benchmark to evaluate a model's ability to identify
the...
|
98
597.0ms median
798µ$
|
98
1125.5ms median
99µ$
|
98
919.5ms median
20µ$
|
98
712.5ms median
189µ$
|
98
734.0ms median
51µ$
|
92
835.0ms median
45µ$
|
98
1102.0ms median
55µ$
|
70
412.0ms median
21µ$
|
100
1830.5ms median
95µ$
|
98
1449.5ms median
76µ$
|
95
942.0ms median
46µ$
|
95
971.5ms median
49µ$
|
90
831.0ms median
42µ$
|
98
758.0ms median
38µ$
|
95
815.5ms median
40µ$
|
70
272.0ms median
14µ$
|
95
785.0ms median
40µ$
|
95
716.0ms median
36µ$
|
10
523.0ms median
28µ$
|
100
1422.5ms median
71µ$
|
-
|
-
|
-
|
98
572.5ms median
28µ$
|
100
1343.0ms median
69µ$
|
100
2000.5ms median
99µ$
|
78
264.5ms median
13µ$
|
English Plural Generation
0033_plural
T2
A benchmark to evaluate a model's ability to produce the correct plu...
|
100
613.0ms median
830µ$
|
100
1125.0ms median
109µ$
|
100
992.0ms median
22µ$
|
100
694.5ms median
225µ$
|
100
682.5ms median
61µ$
|
92
717.5ms median
39µ$
|
100
926.0ms median
48µ$
|
88
321.5ms median
16µ$
|
100
1404.5ms median
76µ$
|
98
1074.0ms median
57µ$
|
92
1034.5ms median
50µ$
|
98
849.0ms median
43µ$
|
85
660.0ms median
35µ$
|
100
698.0ms median
35µ$
|
98
715.0ms median
36µ$
|
72
248.0ms median
13µ$
|
92
624.0ms median
32µ$
|
95
663.5ms median
33µ$
|
25
419.0ms median
21µ$
|
100
1205.0ms median
63µ$
|
-
|
-
|
-
|
90
577.5ms median
29µ$
|
95
1380.0ms median
68µ$
|
100
2198.0ms median
109µ$
|
92
232.0ms median
12µ$
|
Word to IPA
0061_word_to_ipa
T3
A benchmark to evaluate a model's ability to convert words from mult...
|
N/A
|
78
1494.5ms median
95µ$
|
N/A
|
80
1091.0ms median
536µ$
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Sentence Decomposition
0062_sentence_decomposition
T3
A benchmark to evaluate a model's ability to produce multilingual
...
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Translation en_fr
0101_translation_en_fr
T1
A benchmark to evaluate a model's ability to translate
...
|
98
612.0ms median
750µ$
|
95
1007.0ms median
118µ$
|
85
921.5ms median
24µ$
|
98
667.0ms median
153µ$
|
95
686.5ms median
41µ$
|
72
775.0ms median
87µ$
1 latency outlier
|
88
1157.0ms median
84µ$
|
52
393.5ms median
19µ$
|
88
1828.5ms median
98µ$
|
88
1446.5ms median
73µ$
|
92
623.5ms median
34µ$
|
88
1131.0ms median
56µ$
|
28
994.0ms median
123µ$
1 latency outlier
|
90
882.5ms median
44µ$
|
85
817.5ms median
42µ$
|
48
248.0ms median
13µ$
|
88
747.0ms median
38µ$
|
82
739.0ms median
37µ$
|
15
716.5ms median
46µ$
|
90
1588.5ms median
79µ$
|
88
219.0ms median
12µ$
|
90
495.0ms median
25µ$
|
92
3849.5ms median
197µ$
|
95
571.0ms median
27µ$
|
95
1083.5ms median
54µ$
|
95
1699.0ms median
87µ$
|
32
297.5ms median
15µ$
|
Translation en_es
0102_translation_en_es
T1
A benchmark to evaluate a model's ability to translate
...
|
98
611.0ms median
750µ$
|
95
1066.0ms median
118µ$
|
98
936.5ms median
24µ$
|
98
693.5ms median
153µ$
|
95
680.0ms median
41µ$
|
90
763.0ms median
39µ$
|
98
1151.0ms median
57µ$
|
55
391.5ms median
19µ$
|
95
1901.0ms median
95µ$
|
90
1490.5ms median
76µ$
|
90
801.0ms median
39µ$
|
92
1129.0ms median
56µ$
|
18
963.5ms median
49µ$
|
95
903.0ms median
44µ$
|
98
823.0ms median
41µ$
|
40
240.5ms median
13µ$
|
92
747.5ms median
38µ$
|
75
695.0ms median
36µ$
|
10
668.5ms median
37µ$
|
92
1597.0ms median
79µ$
|
85
217.0ms median
11µ$
|
95
498.0ms median
25µ$
|
95
3811.0ms median
195µ$
|
90
561.5ms median
26µ$
|
98
1109.0ms median
56µ$
|
95
1710.5ms median
87µ$
|
32
304.0ms median
15µ$
|
Translation en_de
0103_translation_en_de
T1
A benchmark to evaluate a model's ability to translate
...
|
100
613.0ms median
759µ$
|
95
1002.0ms median
119µ$
|
92
920.0ms median
24µ$
|
98
706.0ms median
156µ$
|
95
694.0ms median
42µ$
|
85
789.0ms median
40µ$
|
90
1167.0ms median
57µ$
|
48
383.5ms median
19µ$
|
92
1888.0ms median
96µ$
|
88
1446.0ms median
80µ$
|
92
788.0ms median
38µ$
|
92
1131.0ms median
57µ$
|
18
997.5ms median
51µ$
|
85
918.0ms median
46µ$
|
92
844.5ms median
43µ$
|
45
247.5ms median
13µ$
|
88
797.5ms median
40µ$
|
92
758.0ms median
38µ$
|
8
1008.0ms median
53µ$
|
88
1609.5ms median
82µ$
|
92
232.5ms median
12µ$
|
95
509.5ms median
26µ$
|
95
3964.5ms median
202µ$
|
95
566.5ms median
27µ$
|
95
1122.5ms median
58µ$
|
92
1762.0ms median
91µ$
|
25
307.5ms median
15µ$
|
Translation fr_es
0104_translation_fr_es
T1
A benchmark to evaluate a model's ability to translate
...
|
100
594.0ms median
750µ$
1 excluded Q
|
100
1014.0ms median
118µ$
1 excluded Q
|
87
978.0ms median
24µ$
1 excluded Q
|
100
685.0ms median
154µ$
1 excluded Q
|
92
687.0ms median
42µ$
1 excluded Q
|
77
759.0ms median
38µ$
1 excluded Q
|
85
1085.0ms median
54µ$
1 excluded Q
|
46
384.0ms median
19µ$
1 excluded Q
|
82
1858.0ms median
94µ$
1 excluded Q
|
87
1522.0ms median
77µ$
1 excluded Q
|
82
790.0ms median
37µ$
1 excluded Q
|
87
1103.0ms median
56µ$
1 excluded Q
|
13
999.0ms median
48µ$
1 excluded Q
|
85
905.0ms median
44µ$
1 excluded Q
|
82
814.0ms median
40µ$
1 excluded Q
|
31
247.0ms median
13µ$
1 excluded Q
|
82
736.0ms median
36µ$
1 excluded Q
|
69
755.0ms median
37µ$
1 excluded Q
|
0
720.0ms median
42µ$
1 excluded Q
|
92
1536.0ms median
76µ$
1 excluded Q
|
85
221.0ms median
11µ$
1 excluded Q
|
82
483.0ms median
25µ$
1 excluded Q
|
90
3752.0ms median
189µ$
1 excluded Q
|
85
548.0ms median
26µ$
1 excluded Q
|
92
1191.0ms median
60µ$
1 excluded Q
|
82
1781.0ms median
95µ$
1 excluded Q
|
36
302.0ms median
15µ$
1 excluded Q
|
Translation en_zh
0105_translation_en_zh
T1
A benchmark to evaluate a model's ability to translate
...
|
100
611.5ms median
761µ$
|
98
981.5ms median
119µ$
|
90
921.0ms median
24µ$
|
100
762.5ms median
156µ$
|
100
701.0ms median
42µ$
|
92
786.0ms median
39µ$
|
92
1095.0ms median
54µ$
|
72
381.0ms median
19µ$
|
98
1875.5ms median
95µ$
|
57
1612.0ms median
86µ$
|
92
720.0ms median
35µ$
|
98
1103.0ms median
55µ$
|
15
1048.0ms median
55µ$
|
80
902.0ms median
44µ$
|
48
901.0ms median
47µ$
|
48
258.5ms median
13µ$
|
92
874.5ms median
43µ$
|
88
914.0ms median
44µ$
|
5
764.5ms median
47µ$
|
98
1893.5ms median
92µ$
|
95
229.5ms median
11µ$
|
98
523.0ms median
26µ$
|
100
3988.5ms median
208µ$
|
100
529.0ms median
26µ$
|
100
1039.0ms median
53µ$
|
98
1732.0ms median
90µ$
|
42
335.0ms median
18µ$
|
Translation en_ja
0106_translation_en_ja
T1
A benchmark to evaluate a model's ability to translate
...
|
100
613.0ms median
760µ$
|
98
1006.5ms median
120µ$
|
85
1020.0ms median
24µ$
|
100
721.5ms median
159µ$
|
100
714.0ms median
43µ$
|
85
805.0ms median
41µ$
|
70
1154.5ms median
57µ$
|
60
398.0ms median
19µ$
|
90
1921.0ms median
97µ$
|
72
1565.5ms median
84µ$
|
90
643.5ms median
34µ$
|
88
1131.5ms median
56µ$
|
22
1056.5ms median
53µ$
|
68
903.0ms median
44µ$
|
75
859.5ms median
44µ$
|
30
240.0ms median
13µ$
|
82
789.0ms median
40µ$
|
75
932.5ms median
46µ$
|
10
786.5ms median
48µ$
|
92
1874.5ms median
92µ$
|
82
241.0ms median
12µ$
|
95
544.5ms median
27µ$
|
100
3895.5ms median
203µ$
|
85
494.0ms median
24µ$
|
95
1072.5ms median
54µ$
|
95
1703.5ms median
90µ$
|
35
346.0ms median
17µ$
|
Translation fr_ko
0107_translation_fr_ko
T1
A benchmark to evaluate a model's ability to translate
...
|
100
514.5ms median
765µ$
|
100
1059.5ms median
121µ$
|
90
923.0ms median
24µ$
|
100
704.0ms median
161µ$
|
98
712.0ms median
44µ$
|
80
842.0ms median
42µ$
|
85
1246.0ms median
61µ$
|
48
409.5ms median
20µ$
|
80
2076.5ms median
102µ$
|
85
1687.0ms median
96µ$
|
88
817.0ms median
40µ$
|
82
1084.0ms median
53µ$
|
15
1167.5ms median
61µ$
|
75
915.5ms median
46µ$
|
72
827.5ms median
42µ$
|
18
263.5ms median
15µ$
|
70
746.5ms median
37µ$
|
52
893.5ms median
46µ$
|
8
926.5ms median
55µ$
|
95
1952.0ms median
96µ$
|
78
237.0ms median
12µ$
|
92
542.5ms median
27µ$
|
98
3923.0ms median
201µ$
|
85
584.0ms median
29µ$
|
95
1135.0ms median
57µ$
|
92
1722.5ms median
89µ$
|
15
398.5ms median
21µ$
|
Translation it_lt
0108_translation_it_lt
T1
A benchmark to evaluate a model's ability to translate
...
|
95
603.5ms median
760µ$
|
92
992.5ms median
122µ$
|
85
935.5ms median
25µ$
|
98
685.5ms median
164µ$
|
92
699.5ms median
44µ$
|
65
870.0ms median
43µ$
|
85
1253.0ms median
64µ$
|
15
406.0ms median
20µ$
|
88
2107.0ms median
105µ$
|
55
1882.0ms median
100µ$
|
55
849.5ms median
42µ$
|
55
1155.0ms median
58µ$
|
12
1053.5ms median
52µ$
|
65
969.0ms median
48µ$
|
68
975.5ms median
49µ$
|
10
282.5ms median
14µ$
|
68
804.0ms median
41µ$
|
30
914.5ms median
47µ$
|
0
916.5ms median
54µ$
|
90
1964.0ms median
96µ$
|
60
264.0ms median
13µ$
|
78
592.5ms median
29µ$
|
92
3972.5ms median
205µ$
|
55
451.5ms median
24µ$
|
95
1272.0ms median
64µ$
|
88
1902.0ms median
96µ$
|
18
325.0ms median
17µ$
|
Translation ja_lt
0109_translation_ja_lt
T1
A benchmark to evaluate a model's ability to translate
...
|
98
560.5ms median
760µ$
|
98
1038.0ms median
122µ$
|
98
946.0ms median
24µ$
|
100
685.0ms median
163µ$
|
100
664.0ms median
44µ$
|
65
865.0ms median
43µ$
|
92
1255.0ms median
63µ$
|
22
410.5ms median
20µ$
|
100
2042.5ms median
102µ$
|
55
1842.0ms median
101µ$
|
80
829.5ms median
42µ$
|
55
1149.0ms median
58µ$
|
15
1017.0ms median
51µ$
|
75
952.0ms median
48µ$
|
75
945.5ms median
48µ$
|
20
267.0ms median
16µ$
|
68
789.0ms median
41µ$
|
28
906.5ms median
46µ$
|
8
822.5ms median
51µ$
|
90
1917.5ms median
96µ$
|
75
256.5ms median
13µ$
|
90
577.5ms median
29µ$
|
98
3910.5ms median
213µ$
|
82
447.0ms median
24µ$
|
95
1128.0ms median
57µ$
|
100
1827.5ms median
90µ$
|
15
315.0ms median
16µ$
|
Verb Forms
0121_verb_forms
T3
A benchmark to evaluate a model's ability to generate full verb-form...
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Lemma Identification
0122_lemma
T3
A benchmark to evaluate a model's ability to identify the lemma (base ...
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Validate Lemma Form (lokys)
0130_validate_lemma_form
T3
A regression benchmark for the lokys agent's validate_lemma_form() f...
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Validate Definition (lokys)
0131_validate_definition
T3
A regression benchmark for the lokys agent's validate_definition() f...
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Validate Translation (voras)
0132_validate_translation
T3
A regression benchmark for the voras agent's validate_all_translatio...
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Validate Bulk IPA/Phonetic (bebras)
0141_validate_pronunciation_bulk
T3
A regression benchmark for Bebras bulk pronunciation verification.
...
|
N/A
|
N/A
|
N/A
|
20
1670.0ms median
2406µ$
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
N/A
|
Geography Knowledge
0151_geography
T2
A benchmark to evaluate a model's knowledge of world geography throu...
|
100
549.0ms median
763µ$
|
100
1212.0ms median
90µ$
|
100
993.0ms median
18µ$
|
100
697.5ms median
169µ$
|
100
670.0ms median
47µ$
|
78
797.5ms median
44µ$
|
95
1100.0ms median
55µ$
|
82
382.0ms median
20µ$
|
100
1686.0ms median
89µ$
|
98
1439.0ms median
77µ$
|
98
665.0ms median
36µ$
|
100
863.0ms median
45µ$
|
12
876.0ms median
55µ$
|
50
765.0ms median
38µ$
|
70
867.5ms median
49µ$
|
92
257.5ms median
13µ$
|
100
738.0ms median
37µ$
|
95
561.0ms median
29µ$
|
20
506.5ms median
26µ$
|
88
1671.0ms median
88µ$
|
-
|
-
|
-
|
95
544.5ms median
27µ$
|
100
1029.0ms median
52µ$
|
92
1658.0ms median
89µ$
|
52
256.0ms median
14µ$
|
Syllogism Validity
0152_syllogism_validity
T2
A benchmark to evaluate whether a model can determine if short
...
|
100
2677.0ms median
1923µ$
|
100
2448.5ms median
265µ$
|
100
1534.0ms median
54µ$
|
100
903.5ms median
453µ$
|
100
974.0ms median
138µ$
|
19
3209.0ms median
169µ$
|
88
6058.5ms median
311µ$
|
0
934.5ms median
49µ$
|
94
9676.5ms median
510µ$
|
88
10932.5ms median
540µ$
|
100
7651.0ms median
377µ$
|
50
8270.5ms median
435µ$
|
6
2164.0ms median
134µ$
|
62
4346.0ms median
230µ$
|
69
5841.0ms median
319µ$
|
31
925.5ms median
53µ$
|
75
3686.0ms median
200µ$
|
62
7497.5ms median
373µ$
|
62
3783.0ms median
156µ$
|
100
18457.5ms median
1041µ$
|
-
|
-
|
-
|
69
8663.0ms median
392µ$
|
69
15473.0ms median
1626µ$
3 latency outliers
|
-
|
12
1063.0ms median
54µ$
|
Book Author Match
0153_book_author_match
T2
A benchmark to evaluate matching famous books to their correct autho...
|
100
1690.5ms median
1348µ$
|
100
2239.5ms median
220µ$
|
94
1431.5ms median
44µ$
|
100
934.5ms median
361µ$
|
100
915.0ms median
95µ$
|
11
2014.5ms median
100µ$
|
100
2996.5ms median
145µ$
|
11
717.0ms median
38µ$
|
100
9186.0ms median
465µ$
|
67
4109.5ms median
229µ$
|
100
2414.0ms median
129µ$
|
94
5832.0ms median
320µ$
|
33
2283.0ms median
141µ$
|
89
3735.5ms median
193µ$
|
89
4019.5ms median
209µ$
|
44
1357.5ms median
68µ$
|
94
2826.0ms median
154µ$
|
94
4290.5ms median
216µ$
|
83
2236.5ms median
127µ$
|
94
14947.5ms median
761µ$
|
-
|
-
|
-
|
72
2935.0ms median
141µ$
|
61
6148.0ms median
375µ$
|
-
|
22
694.0ms median
37µ$
|
Food Category Classification
0154_food_category_classification
T2
A benchmark to evaluate classification of food items by category....
|
100
1688.0ms median
1243µ$
|
100
2151.5ms median
173µ$
|
100
1215.0ms median
34µ$
|
100
867.0ms median
293µ$
|
100
758.0ms median
79µ$
|
25
1420.0ms median
74µ$
|
100
2297.0ms median
116µ$
|
20
753.0ms median
40µ$
|
100
6117.0ms median
290µ$
|
90
3670.5ms median
193µ$
|
100
2197.5ms median
123µ$
|
95
4370.5ms median
240µ$
|
20
1639.5ms median
88µ$
|
95
2687.0ms median
134µ$
|
75
2668.0ms median
135µ$
|
75
558.5ms median
30µ$
|
100
2539.0ms median
138µ$
|
75
3506.5ms median
184µ$
|
20
623.0ms median
113µ$
1 latency outlier
|
100
12760.0ms median
632µ$
|
-
|
-
|
100
6767.0ms median
344µ$
|
95
2765.0ms median
136µ$
|
55
4767.0ms median
399µ$
1 latency outlier
|
-
|
20
647.0ms median
32µ$
|
Historical Event Year
0155_historical_event_year
T2
A benchmark to evaluate selecting the correct year for major histori...
|
100
1586.0ms median
1247µ$
|
100
1943.5ms median
182µ$
|
89
1320.0ms median
38µ$
|
100
895.0ms median
318µ$
|
100
817.0ms median
93µ$
|
100
1767.5ms median
91µ$
|
94
2992.0ms median
150µ$
|
44
770.0ms median
41µ$
|
100
9206.0ms median
454µ$
|
89
4637.0ms median
275µ$
|
100
2438.5ms median
136µ$
|
83
5828.0ms median
288µ$
|
11
2179.5ms median
147µ$
|
6
3402.5ms median
173µ$
|
17
4595.0ms median
232µ$
|
22
967.5ms median
56µ$
|
94
2512.0ms median
133µ$
|
67
3639.0ms median
188µ$
|
28
994.0ms median
149µ$
1 latency outlier
|
94
12527.5ms median
633µ$
|
-
|
-
|
100
9934.0ms median
494µ$
|
61
2038.5ms median
121µ$
|
78
5651.0ms median
303µ$
|
-
|
22
751.0ms median
41µ$
|
Python Hello World Function
0301_python_hello_world
T2
Write a Python 3.12 function that prints Hello world....
|
100
738.0ms median
220µ$
|
100
1251.0ms median
102µ$
|
100
2838.0ms median
20µ$
|
100
784.0ms median
196µ$
|
100
604.0ms median
53µ$
|
100
705.0ms median
35µ$
|
100
2291.0ms median
115µ$
|
100
677.0ms median
34µ$
|
100
2824.0ms median
141µ$
|
100
20546.0ms median
1027µ$
|
100
1203.0ms median
60µ$
|
100
2025.0ms median
101µ$
|
0
1999.0ms median
100µ$
|
100
1592.0ms median
80µ$
|
100
1451.0ms median
73µ$
|
0
463.0ms median
23µ$
|
100
1516.0ms median
76µ$
|
100
1637.0ms median
82µ$
|
N/A
|
100
3153.0ms median
158µ$
|
-
|
-
|
100
3346.0ms median
167µ$
|
100
793.0ms median
40µ$
|
100
29482.0ms median
1474µ$
|
-
|
100
551.0ms median
28µ$
|
Python GCD With Validation
0302_python_gcd
T2
Write a Python 3.12 function for GCD with invalid-input exceptions....
|
0
1298.0ms median
918µ$
|
100
4217.0ms median
260µ$
|
100
3006.0ms median
50µ$
|
100
1337.0ms median
558µ$
|
100
1188.0ms median
153µ$
|
100
3051.0ms median
153µ$
|
100
7020.0ms median
351µ$
|
0
4855.0ms median
243µ$
|
100
7905.0ms median
395µ$
|
100
78672.0ms median
3934µ$
|
0
4039.0ms median
202µ$
|
100
5997.0ms median
300µ$
|
0
5380.0ms median
269µ$
|
100
4998.0ms median
250µ$
|
100
5551.0ms median
278µ$
|
100
2082.0ms median
104µ$
|
100
4843.0ms median
242µ$
|
100
4716.0ms median
236µ$
|
0
4542.0ms median
227µ$
|
100
9782.0ms median
489µ$
|
-
|
-
|
100
6117.0ms median
306µ$
|
100
3876.0ms median
194µ$
|
-
|
-
|
100
2372.0ms median
119µ$
|
Python Letter Count in String
0303_python_letter_count
T2
Count occurrences of a target letter in a string....
|
0
842.0ms median
535µ$
|
100
2911.0ms median
320µ$
|
100
1463.0ms median
57µ$
|
100
1519.0ms median
549µ$
|
100
953.0ms median
181µ$
|
0
2393.0ms median
120µ$
|
100
7248.0ms median
362µ$
|
0
4930.0ms median
247µ$
|
100
9981.0ms median
499µ$
|
-
|
0
5179.0ms median
259µ$
|
100
5841.0ms median
292µ$
|
0
2522.0ms median
126µ$
|
100
4749.0ms median
237µ$
|
100
13036.0ms median
652µ$
|
100
1449.0ms median
72µ$
|
100
5333.0ms median
267µ$
|
0
5780.0ms median
289µ$
|
0
8038.0ms median
402µ$
|
0
8626.0ms median
431µ$
|
-
|
-
|
0
6000.0ms median
300µ$
|
0
2517.0ms median
126µ$
|
-
|
-
|
100
3031.0ms median
152µ$
|
Python Minimum Coin Change
0304_python_coin_change
T2
Compute minimum number of coins to make a target amount....
|
0
2083.0ms median
1422µ$
|
100
5037.0ms median
712µ$
|
0
4173.0ms median
121µ$
|
100
1460.0ms median
1510µ$
|
100
2309.0ms median
404µ$
|
0
2442.0ms median
122µ$
|
0
12101.0ms median
605µ$
|
0
7292.0ms median
365µ$
|
100
25978.0ms median
1299µ$
|
-
|
0
9764.0ms median
488µ$
|
0
10511.0ms median
526µ$
|
0
2352.0ms median
118µ$
|
0
7226.0ms median
361µ$
|
0
16252.0ms median
813µ$
|
0
5343.0ms median
267µ$
|
0
7930.0ms median
397µ$
|
100
9870.0ms median
494µ$
|
0
0.0ms median
6035µ$
1 latency outlier
|
0
21341.0ms median
1067µ$
|
-
|
-
|
100
12970.0ms median
649µ$
|
0
9989.0ms median
499µ$
|
-
|
-
|
0
2315.0ms median
116µ$
|
Python Prime Factorization
0305_python_prime_factorization
T2
Return the prime factorization of a positive integer....
|
0
1821.0ms median
850µ$
|
100
4098.0ms median
417µ$
|
100
2563.0ms median
80µ$
|
100
1029.0ms median
733µ$
|
100
1338.0ms median
211µ$
|
0
2760.0ms median
138µ$
|
100
8917.0ms median
446µ$
|
0
4337.0ms median
217µ$
|
100
13446.0ms median
672µ$
|
-
|
100
5588.0ms median
279µ$
|
100
10789.0ms median
539µ$
|
0
6498.0ms median
325µ$
|
0
6940.0ms median
347µ$
|
100
6034.0ms median
302µ$
|
0
2812.0ms median
141µ$
|
100
5473.0ms median
274µ$
|
100
6080.0ms median
304µ$
|
0
21967.0ms median
1098µ$
|
100
12348.0ms median
617µ$
|
-
|
-
|
100
6774.0ms median
339µ$
|
0
4004.0ms median
200µ$
|
-
|
-
|
0
2032.0ms median
102µ$
|