Technology

68348 readers

4167 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

343

Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought (www.pcgamer.com)

submitted 1 day ago by cm0002@lemmy.world to c/technology@lemmy.world

149 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Imgonnatrythis@sh.itjust.works 75 points 1 day ago (3 children)

"Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains."

That is precisrly how I do math. Feel a little targeted that they called this odd.

[–] echodot@feddit.uk 4 points 16 hours ago* (last edited 16 hours ago) (2 children)

But you're doing two calculations now, an approximate one and another one on the last digits, since you're going to do the approximate calculation you might act as well just do the accurate calculation and be done in one step.

This solution, while it works, has the feeling of evolution. No intelligent design, which I suppose makes sense considering the AI did essentially evolve.

[–] sapetoku@sh.itjust.works 6 points 11 hours ago

No intelligent design, which I suppose makes sense considering the AI did essentially evolve.

And that made a lot of people angry

[–] Imgonnatrythis@sh.itjust.works 8 points 16 hours ago

Appreciate the advice on how my brain should work.

[–] JayGray91@lemmy.zip 24 points 1 day ago (1 children)

I think it's odd in the sense that it's supposed to be software so it should already know what 36 plus 59 is in a picosecond, instead of doing mental arithmetics like we do

At least that's my takeaway

[–] shawn1122@lemm.ee 13 points 20 hours ago* (last edited 20 hours ago) (1 children)

This is what the ARC-AGI test by Chollet has also revealed of current AI / LLMs. They have a tendency to approach problems with this trial and error method and can be extremely inefficient (in their current form) with anything involving abstract / deductive reasoning.

Most LLMs do terribly at the test with the most recent breakthrough being with reasoning models. But even the reasoning models struggle.

ARC-AGI is simple, but it demands a keen sense of perception and, in some sense, judgment. It consists of a series of incomplete grids that the test-taker must color in based on the rules they deduce from a few examples; one might, for instance, see a sequence of images and observe that a blue tile is always surrounded by orange tiles, then complete the next picture accordingly. It’s not so different from paint by numbers.

The test has long seemed intractable to major AI companies. GPT-4, which OpenAI boasted in 2023 had “advanced reasoning capabilities,” didn’t do much better than the zero percent earned by its predecessor. A year later, GPT-4o, which the start-up marketed as displaying “text, reasoning, and coding intelligence,” achieved only 5 percent. Gemini 1.5 and Claude 3.7, flagship models from Google and Anthropic, achieved 5 and 14 percent, respectively.

https://archive.is/7PL2a

[–] Goretantath@lemm.ee 1 points 8 hours ago

Its funny because i approach life with a trial and error method too, not efficient but i get the job done in the end. Always see others who dont and give up like all the people bad at computers who ask the tech support at the company to fix the problem instead of thinking about it for two secs and wonder where life went wrong.

[–] Kolanaki@pawb.social 36 points 1 day ago (4 children)

I use a calculator. Which an AI should also be and not need to do weird shit to do math.

[–] sapetoku@sh.itjust.works 5 points 10 hours ago

A regular AI should use a calculator subroutine, not try to discover basic math every time it's asked something.

[–] Goretantath@lemm.ee 1 points 8 hours ago

Yes, you shove it off onto another to do for you instead of doing it yourself and the ai doesnt.

[–] Jakeroxs@sh.itjust.works 17 points 1 day ago

Function calling is a thing chatbots can do now