this post was submitted on 07 Apr 2025
36 points (100.0% liked)

TechTakes

1788 readers
80 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments
[–] swlabr@awful.systems 31 points 1 week ago* (last edited 1 week ago) (7 children)

“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

LLMs achieve reasoning level of average rationalist

[–] V0ldek@awful.systems 16 points 1 week ago (3 children)

This is actually an accurate representation of most "gifted olympiad laureate attempting to solve a freshman CS problem on the blackboard" students I've went to uni with.

Jumps to the front after 5 seconds from the task being assigned, bluffs that the problem is trivial, tries to salvage their reasoning for 5 minutes when questioned by the tutor, turns out the theorem they said was trivial is actually false, sits down having wasted 10 minutes of everyone's time.

[–] swlabr@awful.systems 7 points 1 week ago (1 children)

This needed a TW jfc (jk, uh, sorta)

[–] V0ldek@awful.systems 7 points 1 week ago

TW: contains real chuds

load more comments (1 replies)
load more comments (4 replies)