Run Details #250
Model
Gemma 2B (LMStudio)
Score
7
Correct (≥100)
3/40
Incorrect (<100)
37
Avg Time
442.7ms
Tokens Used
4625
Cost
886µ$
Benchmark:
0026_time_arithmetic
Run Date: 2026-02-28 15:35:54
Questions (40)
Question: Starting at 22:00, what time was it 40 minutes earlier? Return HH:MM in 24-hour format.
Expected
21:20
Provided
Response(response_text='', structured_data={'time': '20:00'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.7857150000000005e-05, total_msec=557.143, metadata={}), additional_thought=None)
Question: Starting at 16:00, what time was it 10 minutes earlier? Return HH:MM in 24-hour format.
Expected
15:50
Provided
Response(response_text='', structured_data={'time': '15:00'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.18416e-05, total_msec=436.832, metadata={}), additional_thought=None)
Question: Starting at 16:35, what time was it 40 minutes earlier? Return HH:MM in 24-hour format.
Expected
15:55
Provided
Response(response_text='', structured_data={'time': '16:35 - 40 min = 16:55'}, usage=LLMUsage(tokens_in=99, tokens_out=28, cost=3.09183e-05, total_msec=618.366, metadata={}), additional_thought=None)
Question: Starting at 19:40, what time is it after 15 minutes? Return HH:MM in 24-hour format.
Expected
19:55
Provided
Response(response_text='', structured_data={'time': '19:55'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=1.8777e-05, total_msec=375.53999999999996, metadata={}), additional_thought=None)
Question: Starting at 02:55, what time was it 15 minutes earlier? Return HH:MM in 24-hour format.
Expected
02:40
Provided
Response(response_text='', structured_data={'time': '01:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=1.867035e-05, total_msec=373.407, metadata={}), additional_thought=None)
Question: Starting at 21:20, what time is it after 35 minutes? Return HH:MM in 24-hour format.
Expected
21:55
Provided
Response(response_text='', structured_data={'time': '22:50'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.20732e-05, total_msec=441.464, metadata={}), additional_thought=None)
Question: Starting at 08:30, what time was it 50 minutes earlier? Return HH:MM in 24-hour format.
Expected
07:40
Provided
Response(response_text='', structured_data={'time': '16:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.19299e-05, total_msec=438.598, metadata={}), additional_thought=None)
Question: Starting at 04:35, what time is it after 120 minutes? Return HH:MM in 24-hour format.
Expected
06:35
Provided
Response(response_text='', structured_data={'time': '16:35'}, usage=LLMUsage(tokens_in=100, tokens_out=16, cost=2.1939250000000002e-05, total_msec=438.78499999999997, metadata={}), additional_thought=None)
Question: Starting at 04:25, what time was it 50 minutes earlier? Return HH:MM in 24-hour format.
Expected
03:35
Provided
Response(response_text='', structured_data={'time': '19:25'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.1390850000000002e-05, total_msec=427.817, metadata={}), additional_thought=None)
Question: Starting at 04:45, what time is it after 15 minutes? Return HH:MM in 24-hour format.
Expected
05:00
Provided
Response(response_text='', structured_data={'time': '16:00'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.128015e-05, total_msec=425.603, metadata={}), additional_thought=None)
Question: Starting at 12:05, what time was it 5 minutes earlier? Return HH:MM in 24-hour format.
Expected
12:00
Provided
Response(response_text='', structured_data={'time': '11:50'}, usage=LLMUsage(tokens_in=98, tokens_out=16, cost=2.200355e-05, total_msec=440.07099999999997, metadata={}), additional_thought=None)
Question: Starting at 15:35, what time was it 75 minutes earlier? Return HH:MM in 24-hour format.
Expected
14:20
Provided
Response(response_text='', structured_data={'time': '08:50'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.09786e-05, total_msec=419.572, metadata={}), additional_thought=None)
Question: Starting at 03:55, what time was it 10 minutes earlier? Return HH:MM in 24-hour format.
Expected
03:45
Provided
Response(response_text='', structured_data={'time': '03:40'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.024415e-05, total_msec=404.883, metadata={}), additional_thought=None)
Question: Starting at 02:50, what time was it 50 minutes earlier? Return HH:MM in 24-hour format.
Expected
02:00
Provided
Response(response_text='', structured_data={'time': '08:00'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.175455e-05, total_msec=435.091, metadata={}), additional_thought=None)
Question: Starting at 10:25, what time was it 15 minutes earlier? Return HH:MM in 24-hour format.
Expected
10:10
Provided
Response(response_text='', structured_data={'time': '19:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.168635e-05, total_msec=433.727, metadata={}), additional_thought=None)
Question: Starting at 04:55, what time was it 90 minutes earlier? Return HH:MM in 24-hour format.
Expected
03:25
Provided
Response(response_text='', structured_data={'time': '17:00'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.142035e-05, total_msec=428.407, metadata={}), additional_thought=None)
Question: Starting at 21:10, what time is it after 60 minutes? Return HH:MM in 24-hour format.
Expected
22:10
Provided
Response(response_text='', structured_data={'time': '23:10'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.153585e-05, total_msec=430.71700000000004, metadata={}), additional_thought=None)
Question: Starting at 19:50, what time is it after 90 minutes? Return HH:MM in 24-hour format.
Expected
21:20
Provided
Response(response_text='', structured_data={'time': '19:50 + 90 min = 20:40'}, usage=LLMUsage(tokens_in=99, tokens_out=28, cost=3.488075e-05, total_msec=697.615, metadata={}), additional_thought=None)
Question: Starting at 22:45, what time is it after 60 minutes? Return HH:MM in 24-hour format.
Expected
23:45
Provided
Response(response_text='', structured_data={'time': '23:45'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.151875e-05, total_msec=430.375, metadata={}), additional_thought=None)
Question: Starting at 02:45, what time is it after 40 minutes? Return HH:MM in 24-hour format.
Expected
03:25
Provided
Response(response_text='', structured_data={'time': '16:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.19418e-05, total_msec=438.836, metadata={}), additional_thought=None)
Question: Starting at 09:55, what time was it 135 minutes earlier? Return HH:MM in 24-hour format.
Expected
07:40
Provided
Response(response_text='', structured_data={'time': '10:50'}, usage=LLMUsage(tokens_in=100, tokens_out=16, cost=2.194785e-05, total_msec=438.957, metadata={}), additional_thought=None)
Question: Starting at 09:20, what time is it after 45 minutes? Return HH:MM in 24-hour format.
Expected
10:05
Provided
Response(response_text='', structured_data={'time': '17:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.1335200000000002e-05, total_msec=426.704, metadata={}), additional_thought=None)
Question: Starting at 19:10, what time is it after 5 minutes? Return HH:MM in 24-hour format.
Expected
19:15
Provided
Response(response_text='', structured_data={'time': '19:15'}, usage=LLMUsage(tokens_in=98, tokens_out=16, cost=2.02468e-05, total_msec=404.93600000000004, metadata={}), additional_thought=None)
Question: Starting at 03:15, what time is it after 90 minutes? Return HH:MM in 24-hour format.
Expected
04:45
Provided
Response(response_text='', structured_data={'time': '18:15'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.21297e-05, total_msec=442.594, metadata={}), additional_thought=None)
Question: Starting at 18:55, what time is it after 35 minutes? Return HH:MM in 24-hour format.
Expected
19:30
Provided
Response(response_text='', structured_data={'time': '19:10'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.19463e-05, total_msec=438.926, metadata={}), additional_thought=None)
Question: Starting at 22:50, what time is it after 90 minutes? Return HH:MM in 24-hour format.
Expected
00:20
Provided
Response(response_text='', structured_data={'time': '23:40'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.161525e-05, total_msec=432.305, metadata={}), additional_thought=None)
Question: Starting at 06:40, what time is it after 90 minutes? Return HH:MM in 24-hour format.
Expected
08:10
Provided
Response(response_text='', structured_data={'time': '18:40'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.04037e-05, total_msec=408.074, metadata={}), additional_thought=None)
Question: Starting at 23:40, what time was it 135 minutes earlier? Return HH:MM in 24-hour format.
Expected
21:25
Provided
Response(response_text='', structured_data={'time': '21:10'}, usage=LLMUsage(tokens_in=100, tokens_out=16, cost=2.187055e-05, total_msec=437.411, metadata={}), additional_thought=None)
Question: Starting at 01:15, what time was it 20 minutes earlier? Return HH:MM in 24-hour format.
Expected
00:55
Provided
Response(response_text='', structured_data={'time': '09:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.19764e-05, total_msec=439.52799999999996, metadata={}), additional_thought=None)
Question: Starting at 17:50, what time was it 75 minutes earlier? Return HH:MM in 24-hour format.
Expected
16:35
Provided
Response(response_text='', structured_data={'time': '17:15'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.2230050000000002e-05, total_msec=444.601, metadata={}), additional_thought=None)
Question: Starting at 16:05, what time was it 45 minutes earlier? Return HH:MM in 24-hour format.
Expected
15:20
Provided
Response(response_text='', structured_data={'time': '08:00'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.1750350000000004e-05, total_msec=435.007, metadata={}), additional_thought=None)
Question: Starting at 00:45, what time was it 20 minutes earlier? Return HH:MM in 24-hour format.
Expected
00:25
Provided
Response(response_text='', structured_data={'time': '18:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.1418750000000002e-05, total_msec=428.375, metadata={}), additional_thought=None)
Question: Starting at 03:35, what time was it 50 minutes earlier? Return HH:MM in 24-hour format.
Expected
02:45
Provided
Response(response_text='', structured_data={'time': '17:35'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.1692400000000003e-05, total_msec=433.848, metadata={}), additional_thought=None)
Question: Starting at 14:45, what time was it 50 minutes earlier? Return HH:MM in 24-hour format.
Expected
13:55
Provided
Response(response_text='', structured_data={'time': '20:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.165775e-05, total_msec=433.15500000000003, metadata={}), additional_thought=None)
Question: Starting at 02:10, what time is it after 35 minutes? Return HH:MM in 24-hour format.
Expected
02:45
Provided
Response(response_text='', structured_data={'time': '20:40'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.11085e-05, total_msec=422.17, metadata={}), additional_thought=None)
Question: Starting at 11:00, what time is it after 40 minutes? Return HH:MM in 24-hour format.
Expected
11:40
Provided
Response(response_text='', structured_data={'time': '12:20'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.1599650000000002e-05, total_msec=431.993, metadata={}), additional_thought=None)
Question: Starting at 09:25, what time is it after 45 minutes? Return HH:MM in 24-hour format.
Expected
10:10
Provided
Response(response_text='', structured_data={'time': '17:25'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.19094e-05, total_msec=438.18800000000005, metadata={}), additional_thought=None)
Question: Starting at 07:45, what time is it after 75 minutes? Return HH:MM in 24-hour format.
Expected
09:00
Provided
Response(response_text='', structured_data={'time': '19:30'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.170595e-05, total_msec=434.11899999999997, metadata={}), additional_thought=None)
Question: Starting at 11:05, what time is it after 75 minutes? Return HH:MM in 24-hour format.
Expected
12:20
Provided
Response(response_text='', structured_data={'time': '11:50'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.2088450000000004e-05, total_msec=441.769, metadata={}), additional_thought=None)
Question: Starting at 06:35, what time was it 10 minutes earlier? Return HH:MM in 24-hour format.
Expected
06:25
Provided
Response(response_text='', structured_data={'time': '18:35'}, usage=LLMUsage(tokens_in=99, tokens_out=16, cost=2.11279e-05, total_msec=422.558, metadata={}), additional_thought=None)