this post was submitted on 20 Jan 2025
64 points (100.0% liked)

TechTakes

1557 readers
270 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] BigMuffin69@awful.systems 17 points 20 hours ago* (last edited 20 hours ago)

I can't believe they fucking got me with this one. I remember back in August(?) Epoch was getting quotes from top mathematicians like Tarrence Tao to review the benchmark and he was quoted saying like it would be a big deal for a model to do well on this benchmark, it will be several years before a model can solve all these questions organically etc so when O3 dropped and got a big jump from SotA, people (myself) were blown away. At the same time red flags were going up in my mind: Epoch was yapping about how this test was completely confidential and no one would get to see their very special test so the answers wouldn't get leaked. But then how in the hell did they evaluate this model on the test? There's no way O3 was run locally by Epoch at ~$1000 a question -> OAI had to be given the benchmark to run against in house -> maybe they had multiple attempts against it and were tuning the model/ recovering questions from api logs/paying mathematicians in house to produce answers to the problems so they could generate their own solution set??

No. The answer is much stupider. The entire company of Epoch ARE mathematicians working for OAI to make marketing grift to pump the latest toy. They got me lads, I drank the snake oil prepared specifically for people like me to drink :(