this post was submitted on 13 Jan 2025
22 points (100.0% liked)

TechTakes

1557 readers
248 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

(Semi-obligatory thanks to @dgerard for starting this.)

top 50 comments
sorted by: hot top controversial new old
[–] BigMuffin69@awful.systems 11 points 1 day ago* (last edited 1 day ago) (1 children)

Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

From Subbarao Kambhampati via linkedIn:

"𝐎𝐧 𝐭𝐡𝐞 𝐬𝐞𝐞𝐝𝐲 𝐨𝐩𝐭𝐢𝐜𝐬 𝐨𝐟 "𝑩𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒂𝒏 𝑨𝑮𝑰 𝑴𝒐𝒂𝒕 𝒃𝒚 𝑪𝒐𝒓𝒓𝒂𝒍𝒍𝒊𝒏𝒈 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌 𝑪𝒓𝒆𝒂𝒕𝒐𝒓𝒔" hashtag#SundayHarangue. One of the big reasons for the increased volume of "𝐀𝐆𝐈 𝐓𝐨𝐦𝐨𝐫𝐫𝐨𝐰" hype has been o3's performance on the "frontier math" benchmark--something that other models basically had no handle on.

We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1--and that the benchmark creators were not allowed to disclose this *until after o3 *.

That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of "𝒐1/𝒐3 𝒘𝒆𝒓𝒆 𝒋𝒖𝒔𝒕 𝒃𝒆𝒊𝒏𝒈 𝒕𝒓𝒂𝒊𝒏𝒆𝒅 𝒐𝒏 𝒔𝒊𝒎𝒑𝒍𝒆 𝒎𝒂𝒕𝒉, 𝒂𝒏𝒅 𝒕𝒉𝒆𝒚 𝒃𝒐𝒐𝒕𝒔𝒕𝒓𝒂𝒑𝒑𝒆𝒅 𝒕𝒉𝒆𝒎𝒔𝒆𝒍𝒗𝒆𝒔 𝒕𝒐 𝒇𝒓𝒐𝒏𝒕𝒊𝒆𝒓 𝒎𝒂𝒕𝒉"--that the AGI tomorrow crowd seem to have--that 𝘖𝘱𝘦𝘯𝘈𝘐 𝘸𝘩𝘪𝘭𝘦 𝘯𝘰𝘵 𝘦𝘹𝘱𝘭𝘪𝘤𝘪𝘵𝘭𝘺 𝘤𝘭𝘢𝘪𝘮𝘪𝘯𝘨, 𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘭𝘺 𝘥𝘪𝘥𝘯'𝘵 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘤𝘰𝘯𝘵𝘳𝘢𝘥𝘪𝘤𝘵--is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don't completely believe that OpenAI didn't have access to the Olympiad/Frontier Math data before hand.. )

I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

𝑫𝒐𝒊𝒏𝒈 𝒘𝒆𝒍𝒍 𝒐𝒏 𝒉𝒂𝒓𝒅 𝒃𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌𝒔 𝒕𝒉𝒂𝒕 𝒚𝒐𝒖 𝒉𝒂𝒅 𝒑𝒓𝒊𝒐𝒓 𝒂𝒄𝒄𝒆𝒔𝒔 𝒕𝒐 𝒊𝒔 𝒔𝒕𝒊𝒍𝒍 𝒊𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒗𝒆--𝒃𝒖𝒕 𝒅𝒐𝒆𝒔𝒏'𝒕 𝒒𝒖𝒊𝒕𝒆 𝒔𝒄𝒓𝒆𝒂𝒎 "𝑨𝑮𝑰 𝑻𝒐𝒎𝒐𝒓𝒓𝒐𝒘."

We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than "𝘸𝘦 𝘥𝘪𝘥𝘯'𝘵 𝘴𝘦𝘦 𝘵𝘩𝘢𝘵 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘪𝘯𝘴𝘵𝘢𝘯𝘤𝘦 𝘥𝘶𝘳𝘪𝘯𝘨 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨" (see "In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities" https://lnkd.in/gZ2wBM_F ).

At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

Big stupid snake oil strikes again.

[–] aio@awful.systems 1 points 1 day ago

That o3 does well on frontier math held-out set is impressive, no doubt

I think there is plenty of room for doubt still. elliotglazer on reddit writes:

Epoch's lead mathematician here. Yes, OAI funded this and has the dataset, which allowed them to evaluate o3 in-house. We haven't yet independently verified their 25% claim. To do so, we're currently developing a hold-out dataset and will be able to test their model without them having any prior exposure to these problems.

My personal opinion is that OAI's score is legit (i.e., they didn't train on the dataset), and that they have no incentive to lie about internal benchmarking performances. However, we can't vouch for them until our independent evaluation is complete.

(emphasis mine). So there is good reason to doubt that the "held-out dataset" even exists.

[–] sc_griffith@awful.systems 7 points 1 day ago (1 children)

trying to write a thread about polytopia but my images won't upload >:(. idk what i'm doing wrong, i've tried on both my desktop and my phone

[–] self@awful.systems 5 points 1 day ago (1 children)

it should be fixed… again. for some reason our image cache keeps getting into a state where it either stops accepting uploads or stops accepting requests at all. I plan to upgrade us to the latest version soon, but it’ll unfortunately involve a little bit of downtime: to upgrade pict-rs to a new point release, you have to run the migrate command, but it only works for the previous release. we’re two releases behind, so I have to custom package the in-between release just to get us there.

[–] sc_griffith@awful.systems 3 points 1 day ago (1 children)

i see! thanks for all your work <3. I think i'll just write the thread after the upgrade, i got partially done and it started eating my images again so maybe this just isn't the moment

[–] self@awful.systems 3 points 1 day ago

of course! re the images: uggh hell with it, I’m scheduling the maintenance and I’m gonna spend some time in the lead-up isolating a root cause for our breakage just in case the upgrade doesn’t fix it

[–] bitofhope@awful.systems 6 points 1 day ago

Starting to think we're about at the point where you could make the best search engine on the market in these three easy steps:

  1. Search Wikipedia for whatever the user typed and show the top result first.
  2. Check if dot com, org, and net exist and show them in the order of popularity.
  3. End of page.
[–] sailor_sega_saturn@awful.systems 6 points 1 day ago* (last edited 1 day ago) (1 children)

(oh no it's politics)

Trump's new cryptocurrency scheme is surprisingly forthright about being a pump & dump:

CIC Digital LLC, an affiliate of The Trump Organization, and Fight Fight Fight LLC collectively own 80% of the Trump Cards, subject to a 3-year unlocking schedule. CIC Digital LLC and Celebration Cards LLC, the owners of Fight Fight Fight LLC, will receive trading revenue derived from trading activities of Trump Meme Cards.

Essentially according to their own website, they started by selling 20%* of the tokens to the public, and over the next few years will... sell another 80% of the tokens to the public. To the moon!

* half of that they describe as "liquidity" instead of public distribution -- whatever that means.

My gut says that liquidity in this context means "making sure that there are tokens available to purchase for initial buyers" or in other words listing them on the market instead of distributing them at initial purchase price.

[–] sailor_sega_saturn@awful.systems 7 points 1 day ago* (last edited 1 day ago)

I read about this gross Robo Anne Frank LLM by a company called "School AI": Bluesky post (looks like via an activitypub bridge, but I can't be bothered to find the canonical link), News Article, School AI's website.

Gee it sure is weird how all these digital clones the AI companies keep coming up with all have the exact same (lack of a) personality.

[–] blakestacey@awful.systems 16 points 3 days ago (1 children)

So, the Wikipedia article about "prompt engineering" is pretty terrible. First source: OpenAI. Second: a blog. Third: OpenAI. Fourth: OpenAI's blog. ArXiv, arXiv, arXiv... 43 times. Hop on over to the Talk page, and we find this gem:

It is sometimes necessary to make assumptions to write an article (see WP:MNA).

Spoiler alert: that link doesn't justify anything. It basically advises against going off on tangents: There's no need to rehash the fact that evolution is a fact on every damn biology page. It does not say that Wikipedia should have an article on some creationist fantasy, like baraminology or flood geology, based entirely on creationist screeds that all cite each other.

[–] blakestacey@awful.systems 13 points 3 days ago

I have spent the last half-hour in the angry dome

[–] jax@awful.systems 19 points 4 days ago (2 children)

hells yeah it's time for some action - Drew DeVault is organizing a sit-in protest of Jack Dorsey's keynote at FOSDEM 2025.

[–] sinedpick@awful.systems 7 points 3 days ago* (last edited 3 days ago)

eh? I don't see Jackie D's keynote in the schedule, did the threat of a sit-in make them delete it? https://fosdem.org/2025/schedule/ edit: oh, it's linked from Drew's post.

[–] maol@awful.systems 5 points 3 days ago

The people of Muskogee will have their revenge on Jack Dorsey

[–] froztbyte@awful.systems 12 points 4 days ago (2 children)

there it is, sammy has gone and said people are just prompting the model wrong (I recall we’ve had that bit said here earlier)

but in true sammy grift: you just need to be asking the right questions to trump intelligence. “why do you want to suck, as a human?” sammy asks, not understanding a moment of humanity

[–] Amoeba_Girl@awful.systems 9 points 3 days ago (3 children)

how do you even define "raw, intellectual horsepower" and how does it differ from knowing how to formulate questions mother fucker

Pretty sure there's gonna be a contract somewhere that defines raw intelligence as "the amount of money you make for OpenAI."

[–] blakestacey@awful.systems 8 points 3 days ago (1 children)

"Raw, intellectual horsepower" means fucking an intellectual horse without a condom.

Oh, wait, that's rawdogging intellectual horsepower, my mistake.

[–] blakestacey@awful.systems 8 points 3 days ago (2 children)

shot:

Von Neumann arguably had the highest processor-type "horsepower" we know of plus his breadth of intellectual achievements is unparalleled.

chaser:

But imo Grothendieck is a better comparison point for ASI as his intelligence, while being strangely similar to LLMs in some dimensions

[–] V0ldek@awful.systems 4 points 2 days ago

I never thought I'd say this but... don't slander category theory like that, compared to LLMs it's downright useful

[–] Amoeba_Girl@awful.systems 5 points 3 days ago
[–] froztbyte@awful.systems 6 points 3 days ago* (last edited 3 days ago)

the bit about it that I find subtly glorious (in how remarkably fuckwitted it is) is the baseline idea of “intellectual horsepower”

I’m not surprised that this is a view they (of the company that’s effectively going “just 12 more DCs bro it’ll be enough compute bro I promise bro just watch”) hold and consider in such a simple mechanism-rating scale

but it is funny as fuck

[–] Soyweiser@awful.systems 7 points 4 days ago

What they promise: intelligence systems, revolutionary agent which can take over tasks!

What we get: https://youtu.be/izazdBpraC8

[–] froztbyte@awful.systems 6 points 3 days ago

okay so I’ve just found a new shirt I want

[–] swlabr@awful.systems 13 points 4 days ago (3 children)

This is extremely tangential to the areas of sneer interest, but seeing as this is the only technology related community I am in, I’m putting it here.

This song has been making the rounds on the charts/social media and I refuse to believe that it isn’t about the package management tool apt

[–] Amoeba_Girl@awful.systems 11 points 4 days ago

Image

yoooooooooooooo

[–] maol@awful.systems 8 points 4 days ago (1 children)

I hope Toni Basil and the Ting Tings get big royalty cheques.

[–] swlabr@awful.systems 6 points 4 days ago (1 children)

the writers of hey mickey have a writing credit on it, but the ting tings got shafted.

[–] maol@awful.systems 6 points 4 days ago* (last edited 4 days ago)

A shame. Seems like they were clearly inspired by it .... I guess you can't copyright a vibe.

[–] swlabr@awful.systems 8 points 4 days ago (1 children)

of course it is one in a long tradition of tech-tangential songs, including this banger about the inevitable collapse of tech bubbles

[–] Soyweiser@awful.systems 5 points 3 days ago

I remember boten anna as a tech song example

[–] froztbyte@awful.systems 12 points 5 days ago (1 children)

a couple weeks back, I was (bc reasons) looking around to see how to turn off goog's annoying gemini bullshit in an account, and you can!

except then even after doing that, accounts in that org still got prompts (in the form of in-app banners, and sparklebuttons in shit like gmail) to Try The Model

it looks like people aren't biting enough, because now you get it whether you like it or not, for the low low price of pushing up your base account fee! and I checked in one org - "Gemini App" is disabled org-wide, but the fucking prompt is immediately in the UI (and you get a modal popover opening gmail)

fuck these people so much

[–] mii@awful.systems 11 points 5 days ago (2 children)

Oh well. Nothing screams healthy business like force-feeding your product to every customer who can't hammer the conveniently hidden opt-out button fast enough. I'm sure Gemini is doing great.

[–] froztbyte@awful.systems 12 points 5 days ago (2 children)

oh, no no

nooooo no no no

there isn't an opt-out button

there is only:

  1. "Continue",
  2. "Learn More"
[–] bitofhope@awful.systems 13 points 4 days ago (1 children)

Hello, I'd like to punch you in the groin. Will you accept ~or would you like to learn more~?

Sorry, I didn't quite catch that. Did you say you accept?

Ah, you don't want to be punched in the groin. That's OK, I understand. We value your painless existence very much.

Now, obviously we cannot let you opt out of the Strictly Necessary punches in the groin. Surely you understand that if it's necessary to punch you in the groin, your permission or lackthereof is irrelevant. Rest assured, this applies only when we really have to punch you in the groin.

What, do you want me to list all the possible circumstances in which one might be obligated to punch you in the groin? Don't be unreasonable, now. I'm sure you know it when you see it.

That aside, I presume we can punch you in the groin for functional purposes? The kind that may not be strictly necessary, but serve a purpose in the functioning of our service.

Oh, we can't? It's OK, you have the right to make that choice. We don't judge. Anyway, we take it that you're probably at least cool with us punching you in the groin for the purposes of analyzing your behavior to improve our groin punching. Let me know if you decide you don't want us to do that anymore.

Oh, I thought you were cooler than that. Alright, if you hate the working class and want to make it harder for the poor, overworked developers to improve your experience, we'll do it your way. I guess we'll have to make do with just the groin punches that are strictly necessary or for marketing purposes.

Ah, aren't you observant. Have you ever noticed that all the adverts you get are really terrible? That's because advertisers need to be able to punch you in the groin to find out what you like and to make their ads more appealing to you. Just food for thought. But if you really insist…

Fine, fine. Marketing groin punches are out. As for your question, no we don't identify as an advertising company per se. But we are partnered with other companies that are in fact advertising companies. Would you like to adjust your preferences for our groin punching partners?

Well maybe to you it looks like the opt-out process we just went through should also cover this part but can we really know if we don't look?

Who's a good puppy? You're a good puppy, yes you are! ❤️

Will you deny us permission to punch you in the groin on behalf of AAAAAAAAAAA Inc. or will you not?

OK, so we can only punch you in the groin on behalf of AAAAAAAAAAA Inc. for the purposes of Legitimate Interest?

It means the kinds of purposes where there is a legitimate interest to punch you in the groin.

Why would you ask if you didn't want me to answer? Fine, that's a no for Legitimate Interest based groin punching on behalf of AAAAAAAAAAA Inc.

Will you deny us permission to punch you in the groin on behalf of AAAAAAAAAAB Inc. or will you not?

Oh, we have a total of six hundred and sixteen thousand six hundred and sixty-six partners in our crotch impactizing network.

Indeed, we are proud to have such a wide network of trusted allies.

Ugh, fine. I guess I can check the end of the list to see if there's a way to make a selection for all of them at once. Honestly, this form is starting to make me a bit dizzy as well.

Wow, who knew flipping through all those pages would take so long. There's a line in here that says "disagree to all", but there's no checkbox or anything. It's just there. Clicking it doesn't seem to change anything. You can probably assume it worked.

Please calm down, we're almost done. Would you like to accept and save?

Well it sounds like I mean "accept and save the options you just set", not the ones we offered initially, doesn't it?

Your groin punching settings have been applied. I don't think there were any mistakes, but if you need to change the settings, you can find the form hidden somewhere in this house, assuming we remembered to put it there.

[–] froztbyte@awful.systems 8 points 4 days ago (1 children)

this is so wildly on point

yours?

(it should become an internet copypasta and drift into mass consciousness)

[–] bitofhope@awful.systems 9 points 4 days ago

Thanks. I wrote this last night not expecting it to become so long, but I like to think the real work was done by thousands of very clever people with highly sophisticated moral compasses pretending not to understand privacy legislation.

[–] Soyweiser@awful.systems 10 points 4 days ago

Im gonna build a special circle in hell for these people, together with the 'yes' or 'ask me again later' people. On this circle all the software your stack depends on will break your build and releases a new release every friday at 5. Whohahhahah

[–] o7___o7@awful.systems 8 points 4 days ago (1 children)
[–] froztbyte@awful.systems 10 points 4 days ago

little known historical fact: G+ was actually the mark that service got on its popularity exam

[–] froztbyte@awful.systems 13 points 5 days ago (3 children)

some of the first research science on promptfondlers and model-affine dipshits is starting to see the light of day and, in what will surprise probably 0% of our regulars, it confirms some things

(I have grumped about their desire for outsourced thinking in the past myself)

load more comments (3 replies)
load more comments
view more: next ›