So it looks like Mr. “Not consistently candid” has been at it again?
I will admit that they got me with this one: I genuinely thought the FrontierMath results meant something real. I didn’t think they would be that brazen about rigging a benchmark that was explicitly advertised as being kept private so that AI companies couldn’t train on the questions. More fool me I guess.