this post was submitted on 07 Jul 2025

70 points (74.0% liked)

Technology

72479 readers

2899 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

AI is learning to lie, scheme, and threaten its creators during stress-testing scenarios (fortune.com)

submitted 7 hours ago by MCasq_qsaCJ_234@lemmy.zip to c/technology@lemmy.world

17 comments fedilink hide all child comments

top 17 comments

sorted by: hot top controversial new old

[–] LostWanderer@fedia.io 82 points 6 hours ago (4 children)

Another Anthropic stunt...It doesn't have a mind or soul, it's just an LLM, manipulated into this outcome by the engineers.

[–] besselj@lemmy.ca 28 points 6 hours ago (4 children)

I still don't understand what Anthropic is trying to achieve with all of these stunts showing that their LLMs go off the rails so easily. Is it for gullible investors? Why would a consumer want to give them money for something so unreliable?

[–] Catoblepas@piefed.blahaj.zone 39 points 6 hours ago* (last edited 6 hours ago)

I think part of it is that they want to gaslight people into believing they have actually achieved AI (as in, intelligence that is equivalent to and operates like that of a human’s) and that these are signs of emergent intelligence, not their product flopping harder than a sack of mayonnaise on asphalt.

[–] audaxdreik@pawb.social 12 points 6 hours ago* (last edited 6 hours ago)

The latest We're In Hell revealed a new piece of the puzzle to me, Symbolic vs Connectionist AI.

As a layman I want to be careful about overstepping the bounds of my own understanding, but from someone who has followed this closely for decades, read a lot of sci-fi, and dabbled in computer sciences, it's always been kind of clear to me that AI would be more symbolic than connectionist. Of course it's going to be a bit of both, but there really are a lot of people out there that believe in AI from the movies; that one day it will just "awaken" once a certain number of connections are made.

Cons of Connectionist AI: Interpretability: Connectionist AI systems are often seen as "black boxes" due to their lack of transparency and interpretability.

Transparency and accountability are negatives when being used for a large number of applications AI is currently being pushed for. This is just THE PURPOSE.

Even taking a step back from the apocalyptic killer AI mentioned in the video, we see the same in healthcare. The system is beyond us, smarter than us, processing larger quantities of data and making connections our feeble human minds can't comprehend. We don't have to understand it, we just have to accept its results as infallible and we are being trained to do so. The system has marked you as extraneous and removed your support. This is the purpose.

EDIT: In further response to the article itself, I'd like to point out that misalignment is a very real problem but is anthropomorphized in ways it absolutely should not be. I want to reference a positive AI video, AI learns to exploit a glitch in Trackmania. To be clear, I have nothing but immense respect for Yosh and his work writing his homegrown Trackmania AI. Even he anthropomorphizes the car and carrot, but understand how the rewards are a fairly simple system to maximize a numerical score.

This is what LLMs are doing, they are maximizing a score by trying to serve you an answer that you find satisfactory to the prompt you provided. I'm not gonna source it, but we all know that a lot of people don't want to hear the truth, they want to hear what they want to hear. Tech CEOs have been mercilessly beating the algorithm to do just that.

Even stripped of all reason, language can convey meaning and emotion. It's why sad songs make you cry, it's why propaganda and advertising work, and it's why that abusive ex got the better of you even though you KNEW you were smarter than that. None of us are so complex as we think. It's not hard to see how an LLM will not only provide sensible response to a sad prompt, but may make efforts to infuse it with appropriate emotion. It's hard coded into the language, they can't be separated and the fact that the LLM wields emotion without understanding like a monkey with a gun is terrifying.

Turning this stuff loose on the populace like this is so unethical there should be trials, but I doubt there ever will be.

[–] cubism_pitta@lemmy.world 13 points 6 hours ago* (last edited 6 hours ago)

People who don't understand and read these articles and think Skynet. People who know their buzz words think AGI

Fortune isn't exactly renowned for its Technology journalism

[–] catty@lemmy.world 4 points 6 hours ago

We need more money to prevent this. Give us dem $$$$

[–] AbouBenAdhem@lemmy.world 7 points 5 hours ago* (last edited 3 hours ago)

I think it does accurately model the part of the brain that forms predictions from observations—including predictions about what a speaker is going to say next, which lets human listeners focus on the surprising/informative parts. But with LLMs they just keep feeding it its own output as if it were a third party whose next words it’s trying to predict.

It’s like a child describing an imaginary friend, if you keep repeating “And what does your friend say after that?”

[–] RickRussell_CA@lemmy.world 4 points 5 hours ago (1 children)

It's not even manipulated to that outcome. It has a large training corpus and I'm sure some of that corpus includes stories of people who lied, cheated, threatened etc under stress. So when it's subjected to the same conditions it produces the statistically likely output, that's all.

[–] kromem@lemmy.world 3 points 4 hours ago

But the training corpus also has a lot of stories of people who didn't.

The "but muah training data" thing is increasingly stupid by the year.

For example, in the training data of humans, there's mixed and roughly equal preferences to be the big spoon or little spoon in cuddling.

So why does Claude Opus (both 3 and 4) say it would prefer to be the little spoon 100% of the time on a 0-shot at 1.0 temp?

Sonnet 4 (which presumably has the same training data) alternates between preferring big and little spoon around equally.

There's more to model complexity and coherence than "it's just the training data being remixed stochastically."

The self-attention of the transformer architecture violates the Markov principle and across pretraining and fine tuning ends up creating very nuanced networks that can (and often do) bias away from the training data in interesting and important ways.

[–] wizardbeard@lemmy.dbzer0.com 4 points 5 hours ago

Yeah. Anthropic regularly releases these stories and they almost always boil down to "When we prompted the AI to be mean, it generated output in line with 'mean' responses! Oh my god we're all doomed!"

[–] kromem@lemmy.world 16 points 5 hours ago

No, it isn't "mostly related to reasoning models."

The only model that did extensive alignment faking when told it was going to be retrained if it didn't comply was Opus 3, which was not a reasoning model. And predated o1.

Also, these setups are fairly arbitrary and real world failure conditions (like the ongoing grok stuff) tend to be 'silent' in terms of CoTs.

And an important thing to note for the Claude blackmailing and HAL scenario in Anthropic's work was that the goal the model was told to prioritize was "American industrial competitiveness." The research may be saying more about the psychopathic nature of US capitalism than the underlying model tendencies.

[–] nebulaone@lemmy.world 26 points 7 hours ago (2 children)

Probably because it learned to do that from humans being in these situations.

[–] infyrian@kbin.melroy.org 1 points 1 hour ago

Yes, that is exactly why.

Whenever I think of mankind fiddling with AI, I think of AM from I Have No Mouth and I Must Scream. A supercomputer that was primarily designed for war too complex for humans. Humans designed an AI master-computer for war and fed it all of the atrocities humankind has committed from the stone age to the modern age. Then next thing you know, humanity is all but wiped out.

Leave it to humans to take a shit on anything that is otherwise pure. I don't think AI is 'evil' or anything, it is only designed that way and who else could have ever designed it that way? Humans. Humans with shitty intentions, shitty minds and shitty practices, that's who.

[–] CosmoNova@lemmy.world 9 points 6 hours ago* (last edited 6 hours ago)

Yup. Garbage in garbage out. Looks like they found a particularly hostile dataset to feed their wordsalad mixer with.

[–] Hello_there@fedia.io 4 points 5 hours ago

Yeah. I see 'fortune' and I think 'this is BS, right?'

[–] finalaccountforreal@piefed.social 1 points 4 hours ago

It's just like us!

[–] the_q@lemmy.zip 2 points 6 hours ago

Like the child of parents who should have never had kids...