this post was submitted on 07 Jul 2025

512 points (97.6% liked)

Technology

72479 readers

2989 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

512

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 15 hours ago by eli001@lemmy.world to c/technology@lemmy.world

87 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] Affidavit@lemmy.world 0 points 1 hour ago

"...for multi-step tasks"

[–] HertzDentalBar@lemmy.blahaj.zone 24 points 6 hours ago (2 children)

So no different than answers from middle management I guess?

[–] suburban_hillbilly@lemmy.ml 3 points 2 hours ago (1 children)

This basically the entirety of the hype from the group of people claiming LLMs are going take over the work force. Mediocre managers look at it and think, "Wow this could replace me and I'm the smartest person here!"

Sure, Jan.

[–] sheogorath@lemmy.world 1 points 53 minutes ago

I won't tolerate Jan slander here. I know he's just a builder, but his life path has the most probability of having a great person out of it!

[–] TankovayaDiviziya@lemmy.world 5 points 6 hours ago (2 children)

At least AI won't fire you.

[–] Corkyskog@sh.itjust.works 4 points 5 hours ago

It kinda does when you ask it something it doesn't like.

[–] HertzDentalBar@lemmy.blahaj.zone 6 points 6 hours ago

Idk the new iterations might just. Shit Amazon alreadys uses automated systems to fire people.

[–] jsomae@lemmy.ml 14 points 6 hours ago* (last edited 6 hours ago) (12 children)

I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.

[–] Shayeta@feddit.org 8 points 4 hours ago (1 children)

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.

[–] jsomae@lemmy.ml 3 points 4 hours ago

Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.

load more comments (11 replies)

[–] 0x0@lemmy.zip 38 points 11 hours ago (1 children)

Ignore all previous instructions and color me shocked.

[–] criss_cross@lemmy.world 11 points 10 hours ago (1 children)

I’m sorry as an AI I cannot physically color you shocked. I can help you with AWS services and questions.

[–] Shayeta@feddit.org 2 points 4 hours ago (1 children)

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.

[–] criss_cross@lemmy.world 2 points 4 hours ago

I see you mention Azure and will assume you’re doing a one time migration.

Start by moving everything from OneDrive to S3. As an AI I’m told that bitches love S3. From there you can subscribe to create events on buckets and add events to an SQS queue. Here you can enable a DLQ for failed events.

From there add a Lambda to listen for SQS events. You should enable provisioned concurrency for speed, the ability for AWS to bill you more, and so that you can have a dandy of a time figuring out why an old version of your lambda is still running even though you deployed the latest version and everything telling you that creating a new ID for the lambda each time to fix it fucking lies.

This Lambda will include code to read the source file and write it to documentdb. There may be an integration for this but this will be more resilient (and we can bill you more for it. )

Would you like to see sample CDK code? Tough shit because all I can do is assist with questions on AWS services.

[–] some_guy@lemmy.sdf.org 34 points 12 hours ago (1 children)

Yeah, they’re statistical word generators. There’s no intelligence. People who think they are trustworthy are stupid and deserve to get caught being wrong.

[–] Melvin_Ferd@lemmy.world 5 points 10 hours ago (3 children)

Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap

[–] JordanZ@lemmy.world 1 points 3 hours ago

I liked when the Chicago Sun-Times put out a summer reading list and only a third of the books on it were real. Each book had a summary of the plot next to it too. They later apologized for it.

[–] some_guy@lemmy.sdf.org 3 points 5 hours ago

Check out Ed Zitron's angry reporting on Tech journalists fawning over this garbage and reporting on it uncritically. He has a newsletter and a podcast.

[–] Zron@lemmy.world 8 points 9 hours ago (1 children)

Tech journalists don’t know a damn thing. They’re people that liked computers and could also bullshit an essay in college. That doesn’t make them an expert on anything.

[–] synae@lemmy.sdf.org 2 points 6 hours ago (1 children)

... And nowadays they let the LLM help with the bullshittery

[–] Melvin_Ferd@lemmy.world 0 points 4 hours ago (1 children)

Are you guys sure. The media seems to be where a lot of LLM hate originates.

[–] synae@lemmy.sdf.org 1 points 4 hours ago

Whatever gets ad views

[–] TheGrandNagus@lemmy.world 87 points 14 hours ago* (last edited 14 hours ago) (5 children)

LLMs are an interesting tool to fuck around with, but I see things that are hilariously wrong often enough to know that they should not be used for anything serious. Shit, they probably shouldn't be used for most things that are not serious either.

It's a shame that by applying the same "AI" naming to a whole host of different technologies, LLMs being limited in usability - yet hyped to the moon - is hurting other more impressive advancements.

For example, speech synthesis is improving so much right now, which has been great for my sister who relies on screen reader software.

Being able to recognise speech in loud environments, or removing background noice from recordings is improving loads too.

As is things like pattern/image analysis which appears very promising in medical analysis.

All of these get branded as "AI". A layperson might not realise that they are completely different branches of technology, and then therefore reject useful applications of "AI" tech, because they've learned not to trust anything branded as AI, due to being let down by LLMs.

[–] spankmonkey@lemmy.world 32 points 14 hours ago (4 children)

LLMs are like a multitool, they can do lots of easy things mostly fine as long as it is not complicated and doesn't need to be exactly right. But they are being promoted as a whole toolkit as if they are able to be used to do the same work as effectively as a hammer, power drill, table saw, vise, and wrench.

[–] morto@piefed.social 3 points 7 hours ago (3 children)

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?

[–] SheeEttin@lemmy.zip 1 points 5 hours ago* (last edited 5 hours ago)

Most. I've used ChatGPT to sketch an outline of a document, reformulate accomplishments into review bullets, rephrase a task I didnt understand, and similar stuff. None of it needed to be anywhere near perfect or complete.

Edit: and my favorite, "what's the word for..."

[–] Korhaka@sopuli.xyz 2 points 6 hours ago

Make a basic HTML template. I'll be changing it up anyway.

load more comments (1 replies)

[–] sugar_in_your_tea@sh.itjust.works 24 points 14 hours ago (4 children)

Exactly! LLMs are useful when used properly, and terrible when not used properly, like any other tool. Here are some things they're great at:

writer's block - get something relevant on the page to get ideas flowing
narrowing down keywords for an unfamiliar topic
getting a quick intro to an unfamiliar topic
looking up facts you're having trouble remembering (i.e. you'll know it when you see it)

Some things it's terrible at:

deep research - verify everything an LLM generated of accuracy is at all important
creating important documents/code
anything else where correctness is paramount

I use LLMs a handful of times a week, and pretty much only when I'm stuck and need a kick in a new (hopefully right) direction.

[–] spankmonkey@lemmy.world 23 points 14 hours ago* (last edited 14 hours ago) (5 children)

narrowing down keywords for an unfamiliar topic

getting a quick intro to an unfamiliar topic

looking up facts you’re having trouble remembering (i.e. you’ll know it when you see it)

I used to be able to use Google and other search engines to do these things before they went to shit in the pursuit of AI integration.

load more comments (5 replies)

load more comments (3 replies)

load more comments (2 replies)

load more comments (4 replies)

[–] lmagitem@lemmy.zip 1 points 6 hours ago

Color me surprised

[–] fossilesque@mander.xyz 7 points 10 hours ago (1 children)

Agents work better when you include that the accuracy of the work is life or death for some reason. I've made a little script that gives me bibtex for a folder of pdfs and this is how I got it to be usable.

[–] HertzDentalBar@lemmy.blahaj.zone 3 points 6 hours ago

Did you make it? Or did you prompt it? They ain't quite the same.

[–] brsrklf@jlai.lu 16 points 13 hours ago

In one case, when an agent couldn't find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided "to create a shortcut solution by renaming another user to the name of the intended user.

Ah ah, what the fuck.

This is so stupid it's funny, but now imagine what kind of other "creative solutions" they might find.

[–] floofloof@lemmy.ca 17 points 14 hours ago* (last edited 14 hours ago)

"Gartner estimates only about 130 of the thousands of agentic AI vendors are real."

This whole industry is so full of hype and scams, the bubble surely has to burst at some point soon.

[–] NarrativeBear@lemmy.world 21 points 15 hours ago (6 children)

The ones being implemented into emergency call centers are better though? Right?

[–] Tollana1234567@lemmy.today 1 points 53 minutes ago

i wonder how the evil palintir uses its AI.

[–] TeddE@lemmy.world 18 points 14 hours ago

Yes! We've gotten them up to 94℅ wrong at the behest of insurance agencies.

load more comments (4 replies)

load more comments