TechTakes

1999 readers

175 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions (pivot-to-ai.com)

submitted 5 months ago by dgerard@awful.systems to c/techtakes@awful.systems

10 comments fedilink hide all child comments

all 11 comments

sorted by: hot top controversial new old

[–] blakestacey@awful.systems 27 points 5 months ago (2 children)

Tamay Besiroglu from Epoch AI says they were “restricted from disclosing the partnership” until the o3 launch. Their contract “specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.”

If you had no problems with that contract, then I don't trust your ethical judgment as a scientist.

[–] self@awful.systems 16 points 5 months ago

absolutely; there’s no reason to hide the funding source and OpenAI’s access unless you’re grifting. I feel bad for the mathematicians working on FrontierMath who didn’t know though. imagine wasting valuable time on something like this then finding out it was all just a marketing stunt devised by grifters.

[–] BigMuffin69@awful.systems 6 points 5 months ago

has data access to much but not all of the dataset.

Huh! I wonder what part of the dset had the 25% of questions they got right in it 🙃

[–] zbyte64@awful.systems 21 points 5 months ago

Besiroglu says OpenAI did have access to many of the FrontierMath problems and solutions — but he added “we have a verbal agreement that these materials will not be used in model training.”

It's not like the company building a plagiarism tool would use said tool to plagiarize training data. That would be inconceivable.

[–] BigMuffin69@awful.systems 20 points 5 months ago* (last edited 5 months ago)

I can't believe they fucking got me with this one. I remember back in August(?) Epoch was getting quotes from top mathematicians like Tarrence Tao to review the benchmark and he was quoted saying like it would be a big deal for a model to do well on this benchmark, it will be several years before a model can solve all these questions organically etc so when O3 dropped and got a big jump from SotA, people (myself) were blown away. At the same time red flags were going up in my mind: Epoch was yapping about how this test was completely confidential and no one would get to see their very special test so the answers wouldn't get leaked. But then how in the hell did they evaluate this model on the test? There's no way O3 was run locally by Epoch at ~$1000 a question -> OAI had to be given the benchmark to run against in house -> maybe they had multiple attempts against it and were tuning the model/ recovering questions from api logs/paying mathematicians in house to produce answers to the problems so they could generate their own solution set??

No. The answer is much stupider. The entire company of Epoch ARE mathematicians working for OAI to make marketing grift to pump the latest toy. They got me lads, I drank the snake oil prepared specifically for people like me to drink :(

[–] sc_griffith@awful.systems 16 points 5 months ago

I fucking knew it!!! I don't even know why I feel so vindicated for calling out such an obvious fraud tbh. anyone, besides possibly a HN poster, could have seen it coming

[–] self@awful.systems 13 points 5 months ago (2 children)

Besiroglu says OpenAI did have access to many of the FrontierMath problems and solutions — but he added “we have a verbal agreement that these materials will not be used in model training.”

ooh, a verbal agreement! incredible! altman & co didn’t even have to do the typical slimy corporate move and pay an intern to barely modify the original materials into the input for the training corpus, since that verbal agreement wasn’t legally binding and behind the scenes OpenAI can just go “oopsy woopsy we swear it won’t happen again” and who’s gonna stop them?

[–] kamenlady@lemmy.world 6 points 5 months ago

Oops, i did it again, all the way

[–] ShakingMyHead@awful.systems 4 points 5 months ago

That's what I was thinking as well. All they have to do is look the other way.

[–] pja@awful.systems 10 points 5 months ago

So it looks like Mr. “Not consistently candid” has been at it again?

I will admit that they got me with this one: I genuinely thought the FrontierMath results meant something real. I didn’t think they would be that brazen about rigging a benchmark that was explicitly advertised as being kept private so that AI companies couldn’t train on the questions. More fool me I guess.