Benchmark Dashboard

Last updated: March 02, 2026 at 10:19

Filters
Benchmark
Claude Haiku 4.5
GPT-5 mini
GPT-5 nano
Gemma 2 2B (LMStudio)
1500 MB
Gemma 2 9B (LMStudio)
5800 MB
Gemma 2B (LMStudio)
1500 MB
Gemma 3 12B (LMStudio)
8100 MB
Granite 3.2 8B (LMStudio)
4900 MB
Llama 2 7B (LMStudio)
4900 MB
Llama 3 8B (LMStudio)
4900 MB
Llama 3.1 8B (LMStudio)
4900 MB
Llama 3.2 1B (LMStudio)
1300 MB
Ministral 8B (LMStudio)
4900 MB
OLMo 3 7B (LMStudio)
4300 MB
Phi-3.5 Mini (LMStudio)
2500 MB
Qwen3 1.7B (LMStudio)
1100 MB
Qwen3 4B (LMStudio)
2800 MB
Qwen3 VL 8B (LMStudio)
5000 MB
SmolLM2 1.7B (LMStudio)
1100 MB
Word Length
0011_word_length
A benchmark to evaluate a model's ability to count the to...
100
742.5ms
730µ$
100
1341.8ms
80µ$
42
992.8ms
16µ$
17
1644.2ms
82µ$
60
2390.3ms
120µ$
20
1256.9ms
63µ$
85
3372.6ms
173µ$
22
1927.9ms
96µ$
27
570.5ms
29µ$
95
2631.9ms
132µ$
75
2754.4ms
138µ$
27
660.2ms
33µ$
45
2443.2ms
122µ$
82
2280.9ms
114µ$
15
1196.8ms
60µ$
72
1265.8ms
63µ$
57
1859.3ms
93µ$
47
4256.9ms
213µ$
35
518.9ms
26µ$
Letter Count
0012_letter_count
A benchmark to evaluate a model's ability to count how ma...
77
704.6ms
717µ$
47
1256.8ms
77µ$
42
1101.0ms
15µ$
12
692.6ms
35µ$
30
628.5ms
31µ$
20
277.3ms
14µ$
52
1398.2ms
70µ$
52
670.4ms
34µ$
15
534.9ms
27µ$
15
532.6ms
27µ$
35
560.5ms
28µ$
42
194.4ms
10µ$
32
522.6ms
26µ$
45
485.4ms
24µ$
12
360.2ms
18µ$
22
187.9ms
9µ$
47
385.5ms
19µ$
55
2607.5ms
130µ$
20
190.7ms
10µ$
Vowel Count
0013_vowel_count
Tests ability to count vowels (a, e, i, o, u and accented forms) in a word acros...
80
708.0ms
791µ$
87
1232.1ms
97µ$
55
1334.4ms
19µ$
40
754.2ms
38µ$
37
984.6ms
49µ$
27
364.0ms
18µ$
42
1589.7ms
80µ$
32
771.2ms
39µ$
27
699.3ms
35µ$
37
726.3ms
36µ$
42
764.7ms
38µ$
27
239.8ms
12µ$
47
729.8ms
37µ$
67
637.0ms
32µ$
0
452.1ms
23µ$
45
248.4ms
12µ$
40
486.7ms
24µ$
67
3062.4ms
153µ$
7
248.2ms
12µ$
Syllable Count
0014_syllable_count
Tests ability to count syllables in words across Latin-alphabet languages....
92
679.5ms
738µ$
90
1444.3ms
85µ$
62
1248.6ms
17µ$
30
769.3ms
38µ$
70
1045.7ms
52µ$
22
357.9ms
18µ$
47
1692.8ms
85µ$
37
893.3ms
45µ$
30
732.5ms
37µ$
75
724.1ms
36µ$
87
717.0ms
36µ$
42
240.1ms
12µ$
72
695.6ms
35µ$
62
661.5ms
33µ$
5
440.4ms
22µ$
25
259.7ms
13µ$
62
499.4ms
25µ$
27
3046.4ms
152µ$
15
248.1ms
12µ$
Spell Check
0015_spell_check
A benchmark to evaluate a model's ability to identify mis...
100
805.6ms
796µ$
100
1291.1ms
89µ$
95
1463.5ms
18µ$
90
990.0ms
50µ$
100
1398.9ms
70µ$
55
559.9ms
28µ$
100
2127.1ms
106µ$
95
1183.5ms
59µ$
37
1076.5ms
54µ$
97
970.7ms
49µ$
92
982.8ms
49µ$
40
339.5ms
17µ$
90
1019.6ms
51µ$
85
861.5ms
43µ$
37
585.0ms
29µ$
75
295.0ms
15µ$
87
635.0ms
32µ$
100
3319.7ms
166µ$
25
325.9ms
16µ$
Antonym Identification
0016_antonym
A benchmark to evaluate a model's ability to identify the...
100
646.8ms
723µ$
100
1251.0ms
80µ$
100
1216.5ms
16µ$
100
740.1ms
37µ$
100
935.1ms
47µ$
90
390.8ms
20µ$
100
1717.2ms
86µ$
100
731.3ms
37µ$
37
780.9ms
39µ$
100
667.2ms
33µ$
100
702.1ms
35µ$
62
263.1ms
13µ$
100
663.1ms
33µ$
100
633.1ms
32µ$
7
505.5ms
25µ$
92
216.2ms
11µ$
97
440.2ms
22µ$
100
2915.6ms
146µ$
30
231.8ms
12µ$
Multilingual Synonym Generation
0017_synonyms
A benchmark to evaluate a model's ability to generate noun synonyms ...
98
689.8ms
739µ$
100
1287.4ms
83µ$
96
865.5ms
17µ$
76
839.6ms
42µ$
96
1065.6ms
53µ$
30
405.5ms
20µ$
98
1848.6ms
92µ$
90
1106.2ms
55µ$
17
975.7ms
49µ$
78
875.4ms
44µ$
80
887.8ms
44µ$
21
262.1ms
13µ$
69
752.2ms
38µ$
76
840.1ms
42µ$
1
611.6ms
31µ$
88
264.2ms
13µ$
96
537.9ms
27µ$
100
3160.9ms
158µ$
15
304.2ms
15µ$
Pinyin Letter Count
0018_pinyin_letters
A benchmark to evaluate a model's ability to count how many times a s...
35
819.4ms
879µ$
35
1116.7ms
119µ$
15
1048.0ms
24µ$
30
812.1ms
41µ$
30
902.9ms
45µ$
60
351.2ms
18µ$
50
1847.8ms
92µ$
25
787.5ms
39µ$
15
735.8ms
37µ$
25
600.8ms
30µ$
15
587.9ms
29µ$
10
212.5ms
11µ$
15
624.6ms
31µ$
20
531.7ms
27µ$
0
827.1ms
41µ$
5
237.4ms
12µ$
5
443.6ms
22µ$
20
3472.7ms
174µ$
35
236.0ms
12µ$
Simple Arithmetic
0021_simple_arithmetic
A benchmark to evaluate a model's ability to perform basic arithmeti...
100
682.2ms
703µ$
100
1101.0ms
73µ$
100
936.5ms
15µ$
97
707.1ms
35µ$
100
678.7ms
34µ$
97
309.9ms
16µ$
100
1564.2ms
78µ$
100
642.0ms
32µ$
57
528.7ms
26µ$
100
552.4ms
28µ$
90
603.8ms
30µ$
100
188.5ms
9µ$
100
640.6ms
32µ$
100
467.8ms
23µ$
10
354.2ms
18µ$
100
201.3ms
10µ$
100
413.5ms
21µ$
100
3133.3ms
157µ$
95
191.7ms
10µ$
Unit Conversion
0022_unit_conversion
A benchmark to evaluate a model's ability to accurately convert ...
-
100
1179.1ms
80µ$
72
1141.5ms
16µ$
25
772.3ms
39µ$
97
1128.2ms
56µ$
12
349.1ms
17µ$
95
1948.9ms
97µ$
87
1006.1ms
50µ$
17
900.4ms
45µ$
80
646.0ms
32µ$
85
647.1ms
32µ$
22
229.6ms
12µ$
85
864.5ms
43µ$
97
641.5ms
32µ$
5
458.0ms
23µ$
67
322.5ms
16µ$
85
660.5ms
33µ$
95
3214.4ms
161µ$
50
253.3ms
13µ$
Math Word Problems
0023_word_problems
A benchmark to evaluate a model's ability to read math word problems...
-
97
1118.8ms
80µ$
95
1056.9ms
16µ$
75
676.9ms
34µ$
100
897.4ms
45µ$
67
340.9ms
17µ$
100
1497.3ms
75µ$
90
786.5ms
39µ$
67
650.5ms
33µ$
100
610.9ms
31µ$
70
606.1ms
30µ$
75
190.4ms
10µ$
100
650.8ms
33µ$
100
474.7ms
24µ$
20
416.9ms
21µ$
92
222.6ms
11µ$
92
443.4ms
22µ$
100
3032.8ms
152µ$
62
213.8ms
11µ$
Fractions and Percentages
0024_percentage_math
A benchmark to evaluate a model's ability to calculate percentages a...
-
100
1172.5ms
74µ$
100
918.6ms
15µ$
90
673.3ms
34µ$
97
736.6ms
37µ$
52
293.4ms
15µ$
100
1448.0ms
72µ$
100
741.5ms
37µ$
47
643.8ms
32µ$
90
551.6ms
28µ$
100
612.5ms
31µ$
50
192.7ms
10µ$
95
617.3ms
31µ$
97
484.1ms
24µ$
5
396.2ms
20µ$
95
208.8ms
10µ$
100
424.9ms
21µ$
100
2990.4ms
150µ$
67
216.6ms
11µ$
Algebra
0025_algebra
A benchmark to evaluate a model's ability to solve linear and quadra...
-
97
1072.6ms
79µ$
80
1036.0ms
16µ$
42
735.3ms
37µ$
72
1003.0ms
50µ$
30
332.6ms
17µ$
85
1531.8ms
77µ$
45
793.1ms
40µ$
10
717.7ms
37µ$
47
700.1ms
35µ$
55
746.2ms
37µ$
42
227.4ms
11µ$
47
624.3ms
31µ$
62
613.6ms
31µ$
2
433.4ms
22µ$
60
223.2ms
11µ$
65
425.2ms
21µ$
100
3017.0ms
151µ$
35
238.7ms
12µ$
Time Arithmetic
0026_time_arithmetic
A benchmark to evaluate a model's ability to add and subtract ...
-
100
1179.0ms
82µ$
82
1085.4ms
16µ$
35
882.1ms
44µ$
70
1012.1ms
51µ$
7
442.7ms
22µ$
70
1777.7ms
89µ$
60
933.4ms
47µ$
17
869.1ms
43µ$
65
611.2ms
31µ$
55
711.8ms
36µ$
2
315.4ms
16µ$
55
699.7ms
35µ$
65
540.1ms
27µ$
0
454.2ms
23µ$
27
263.3ms
13µ$
65
495.2ms
25µ$
75
3380.5ms
169µ$
5
258.1ms
13µ$
Geometry
0027_geometry
A benchmark to evaluate a model's ability to calculate area, perimet...
-
100
1144.2ms
76µ$
97
965.0ms
15µ$
67
684.8ms
34µ$
95
837.9ms
42µ$
30
335.1ms
17µ$
95
1529.5ms
77µ$
72
789.0ms
39µ$
20
643.2ms
32µ$
82
599.6ms
30µ$
60
651.1ms
33µ$
40
201.6ms
10µ$
82
670.8ms
34µ$
95
521.3ms
26µ$
5
387.3ms
19µ$
80
223.0ms
11µ$
97
453.9ms
23µ$
100
3113.6ms
156µ$
45
228.0ms
11µ$
Translation en_fr
0101_translation_en_fr
A benchmark to evaluate a model's ability to translate ...
-
95
1141.8ms
118µ$
85
1065.3ms
24µ$
72
1747.2ms
87µ$
87
1680.5ms
86µ$
52
387.8ms
19µ$
87
1964.0ms
98µ$
87
1123.4ms
56µ$
27
2466.6ms
123µ$
90
880.3ms
44µ$
85
837.9ms
42µ$
47
253.6ms
13µ$
87
760.5ms
38µ$
82
730.7ms
37µ$
15
915.9ms
46µ$
87
229.8ms
12µ$
90
505.2ms
25µ$
92
3947.1ms
197µ$
32
307.7ms
15µ$
Translation en_es
0102_translation_en_es
A benchmark to evaluate a model's ability to translate ...
-
95
1151.5ms
118µ$
97
983.9ms
24µ$
90
772.9ms
39µ$
97
1142.9ms
57µ$
55
372.2ms
19µ$
95
1898.5ms
95µ$
92
1118.0ms
56µ$
17
979.2ms
49µ$
95
877.1ms
44µ$
97
825.4ms
41µ$
40
260.3ms
13µ$
92
751.3ms
38µ$
75
715.2ms
36µ$
10
738.2ms
37µ$
85
220.6ms
11µ$
95
506.1ms
25µ$
95
3892.2ms
195µ$
32
302.1ms
15µ$
Translation en_de
0103_translation_en_de
A benchmark to evaluate a model's ability to translate ...
-
95
1067.5ms
119µ$
92
966.7ms
24µ$
85
792.1ms
40µ$
90
1135.8ms
57µ$
47
380.4ms
19µ$
92
1924.6ms
96µ$
92
1144.9ms
57µ$
17
1009.8ms
51µ$
85
924.0ms
46µ$
92
867.7ms
43µ$
45
266.1ms
13µ$
87
808.4ms
40µ$
92
766.2ms
38µ$
7
1060.1ms
53µ$
92
237.1ms
12µ$
95
526.8ms
26µ$
95
4042.6ms
202µ$
25
306.2ms
15µ$
Translation fr_es
0104_translation_fr_es
A benchmark to evaluate a model's ability to translate ...
-
97
1061.2ms
118µ$
85
994.7ms
24µ$
75
764.8ms
38µ$
82
1089.3ms
54µ$
45
370.5ms
19µ$
80
1867.8ms
93µ$
85
1109.8ms
56µ$
12
953.8ms
49µ$
82
877.3ms
44µ$
80
807.8ms
40µ$
30
253.3ms
13µ$
80
726.3ms
36µ$
67
739.1ms
37µ$
0
841.7ms
42µ$
82
224.7ms
11µ$
80
493.1ms
25µ$
87
3800.8ms
190µ$
35
300.0ms
15µ$
Translation en_zh
0105_translation_en_zh
A benchmark to evaluate a model's ability to translate ...
-
97
1008.6ms
119µ$
90
945.1ms
24µ$
92
785.1ms
39µ$
92
1083.9ms
54µ$
72
377.1ms
19µ$
97
1904.4ms
95µ$
97
1108.4ms
55µ$
15
1098.6ms
55µ$
80
881.4ms
44µ$
47
944.7ms
47µ$
47
266.4ms
13µ$
92
864.5ms
43µ$
87
884.1ms
44µ$
5
948.9ms
47µ$
95
225.5ms
11µ$
97
512.0ms
26µ$
100
4150.1ms
208µ$
42
364.5ms
18µ$
Translation en_ja
0106_translation_en_ja
A benchmark to evaluate a model's ability to translate ...
-
97
1053.0ms
120µ$
85
1176.0ms
24µ$
85
810.3ms
41µ$
70
1137.3ms
57µ$
60
386.9ms
19µ$
90
1942.6ms
97µ$
87
1111.8ms
56µ$
22
1060.1ms
53µ$
67
878.9ms
44µ$
75
880.4ms
44µ$
30
255.7ms
13µ$
82
793.4ms
40µ$
75
913.2ms
46µ$
10
956.1ms
48µ$
82
241.7ms
12µ$
95
546.8ms
27µ$
100
4068.3ms
203µ$
35
343.4ms
17µ$
Translation fr_ko
0107_translation_fr_ko
A benchmark to evaluate a model's ability to translate ...
-
100
1145.7ms
121µ$
90
971.7ms
24µ$
80
849.1ms
42µ$
85
1228.5ms
61µ$
47
404.3ms
20µ$
80
2034.4ms
102µ$
82
1064.5ms
53µ$
15
1212.0ms
61µ$
75
909.7ms
46µ$
72
844.4ms
42µ$
17
300.8ms
15µ$
70
746.8ms
37µ$
52
926.9ms
46µ$
7
1104.2ms
55µ$
77
236.9ms
12µ$
92
533.7ms
27µ$
97
4012.7ms
201µ$
15
411.0ms
21µ$
Translation it_lt
0108_translation_it_lt
A benchmark to evaluate a model's ability to translate ...
-
92
1088.0ms
122µ$
85
979.0ms
25µ$
65
865.8ms
43µ$
85
1280.7ms
64µ$
15
397.4ms
20µ$
87
2105.2ms
105µ$
55
1157.8ms
58µ$
12
1042.0ms
52µ$
65
963.4ms
48µ$
67
972.7ms
49µ$
10
288.1ms
14µ$
67
815.8ms
41µ$
30
934.1ms
47µ$
0
1081.5ms
54µ$
60
266.2ms
13µ$
77
585.1ms
29µ$
92
4101.7ms
205µ$
17
329.6ms
17µ$
Translation ja_lt
0109_translation_ja_lt
A benchmark to evaluate a model's ability to translate ...
-
97
1099.9ms
122µ$
97
1002.5ms
24µ$
65
860.4ms
43µ$
92
1265.5ms
63µ$
22
393.3ms
20µ$
100
2040.9ms
102µ$
55
1152.4ms
58µ$
15
1017.3ms
51µ$
75
969.4ms
48µ$
75
951.0ms
48µ$
20
315.9ms
16µ$
67
810.5ms
41µ$
27
912.7ms
46µ$
7
1029.2ms
51µ$
75
256.1ms
13µ$
90
580.8ms
29µ$
97
4266.3ms
213µ$
15
318.9ms
16µ$