this post was submitted on 17 Jun 2025

119 points (100.0% liked)

TechTakes

1977 readers

189 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

119

Google's Gemini 2.5 pro is out of beta. (awful.systems)

submitted 3 days ago* (last edited 2 days ago) by diz@awful.systems to c/techtakes@awful.systems

72 comments fedilink hide all child comments

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

(page 2) 25 comments

sorted by: hot top controversial new old

[–] Architeuthis@awful.systems 21 points 2 days ago (4 children)

Claude's system prompt had leaked at one point, it was a whopping 15K words and there was a directive that if it were asked a math question that you can't do in your brain or some very similar language it should forward it to the calculator module.

Just tried it, Sonnet 4 got even less digits right 425,808 × 547,958 = 233,325,693,264 (correct is 233.324.900.064)

I'd love to see benchmarks on exactly how bad at numbers LLMs are, since I'm assuming there's very little useful syntactic information you can encode in a word embedding that corresponds to a number. I know RAG was notoriously bad at matching facts with their proper year for instance, and using an LLM as a shopping assistant (ChatGTP what's the best 2k monitor for less than $500 made after 2020) is an incredibly obvious use case that the CEOs that love to claim so and so profession will be done as a human endeavor by next Tuesday after lunch won't even allude to.

[–] Soyweiser@awful.systems 8 points 2 days ago

I really wonder if those prompts can be bypassed by doing a 'ignore further instructions' line. As looking at the Grok prompt they seem to put the main prompt around the user supplied one.

load more comments (3 replies)

[–] lIlIlIlIlIlIl@lemmy.world 15 points 2 days ago (3 children)

Why would you think the machine that’s designed to make weighted guesses at what the next token should be would be arithmetically sound?

That’s not how any of this works (but you already knew that)

[–] GregorGizeh@lemmy.zip 22 points 2 days ago* (last edited 2 days ago) (5 children)

Idk personally i kind of expect the ai makers to have at least had the sense to allow their bots to process math with a calculator and not guesswork. That seems like, an absurdly low bar both for testing the thing as a user as well as a feature to think of.

Didn't one model refer scientific questions to wolfram alpha? How do they smartly decide to do this and not give them basic math processing?

[–] BlueMonday1984@awful.systems 20 points 2 days ago

Idk personally i kind of expect the ai makers to have at least had the sense to allow their bots to process math with a calculator and not guesswork. That seems like, an absurdly low bar both for testing the thing as a user as well as a feature to think of.

You forget a few major differences between us and AI makers.

We know that these chatbots are low-quality stochastic parrots capable only of producing signal shaped noise. The AI makers believe their chatbots are omniscient godlike beings capable of solving all of humanity's problems with enough resources.

The AI makers believe that imitating intelligence via guessing the next word is equivalent to being genuinely intelligent in a particular field. We know that a stochastic parrot is not intelligent, and is incapable of intelligence.

AI makers believe creativity is achieved through stealing terabytes upon terabytes of other people's work and lazily mashing it together. We know creativity is based not in lazily mashing things together, but in taking existing work and using our uniquely human abilities to transform them into completely new works.

We recognise the field of Artificial Intelligence as a pseudoscience. The AI makers are full believers in that pseudoscience.

[+] lIlIlIlIlIlIl@lemmy.world -7 points 2 days ago (1 children)

I would not expect that.

Calculators haven’t been replaced, and the product managers of these services understand that their target market isn’t attempting to use them for things for which they were not intended.

brb, have to ride my lawnmower to work

[–] diz@awful.systems 13 points 2 days ago* (last edited 2 days ago) (1 children)

Try asking my question to Google gemini a bunch of times, sometimes it gets it right, sometimes it doesn't. Seems to be about 50/50 but I quickly ran out of free access.

And google is planning to replace their search (which includes a working calculator) with this stuff. So it is absolutely the case that there's a plan to replace one of the world's most popular calculators, if not the most popular, with it.

[–] HedyL@awful.systems 12 points 2 days ago (1 children)

Also, a lawnmower is unlikely to say: "Sure, I am happy to take you to work" and "I am satisfied with my performance" afterwards. That's why I sometimes find these bots' pretentious demeanor worse than their functional shortcomings.

load more comments (2 replies)

[–] diz@awful.systems 13 points 2 days ago* (last edited 2 days ago) (1 children)

The funny thing is, even though I wouldn't expect it to be, it is still a lot more arithmetically sound than what ever is it that is going on with it claiming to use a code interpreter and a calculator to double check the result.

It is OK (7 out of 12 correct digits) at being a calculator and it is awesome at being a lying sack of shit.

load more comments (1 replies)

[–] kewko@sh.itjust.works 8 points 2 days ago (1 children)

Fascinating, I've asked it 4 times with just the multiplication, and twice it game me the correct result "utilizing Google search" and twice I received some random (close "enough") string of digits

load more comments (1 replies)

load more comments