TechTakes

2057 readers

466 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

LLMs average <5% on 2025 Math Olympiad; award each other 20x points (arxiv.org)

submitted 3 months ago by slop_as_a_service@awful.systems to c/techtakes@awful.systems

44 comments fedilink hide all child comments

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments

[–] swlabr@awful.systems 33 points 3 months ago* (last edited 3 months ago) (7 children)

“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

LLMs achieve reasoning level of average rationalist

[–] V0ldek@awful.systems 17 points 3 months ago (3 children)

This is actually an accurate representation of most "gifted olympiad laureate attempting to solve a freshman CS problem on the blackboard" students I've went to uni with.

Jumps to the front after 5 seconds from the task being assigned, bluffs that the problem is trivial, tries to salvage their reasoning for 5 minutes when questioned by the tutor, turns out the theorem they said was trivial is actually false, sits down having wasted 10 minutes of everyone's time.

[–] swlabr@awful.systems 7 points 3 months ago (1 children)

This needed a TW jfc (jk, uh, sorta)

[–] V0ldek@awful.systems 7 points 3 months ago

TW: contains real chuds

load more comments (1 replies)

load more comments (4 replies)