scruiser

joined 2 years ago
[–] scruiser@awful.systems 21 points 1 month ago (3 children)

A new LLM plays pokemon has started, with o3 this time. It plays moderately faster, and the twitch display UI is a little bit cleaner, so it is less tedious to watch. But in terms of actual ability, so far o3 has made many of the exact same errors as Claude and Gemini including: completely making things up/seeing things that aren't on the screen (items in Virdian Forest), confused attempts at navigation (it went back and forth on whether the exit to Virdian Forest was in the NE or NW corner), repeating mistakes to itself (both the items and the navigation issues I mentioned), confusing details from other generations of Pokemon (Nidoran learns double kick at level 12 in Fire Red and Leaf Green, but not the original Blue/Yellow), and it has signs of being prone to going on completely batshit tangents (it briefly started getting derailed about sneaking through the tree in Virdian Forest... i.e. moving through completely impassable tiles).

I don't know how anyone can watch any of the attempts at LLMs playing Pokemon and think (viable) LLM agents are just around the corner... well actually I do know: hopium, cope, cognitive bias, and deliberate deception. The whole LLM playing Pokemon thing is turning into less of a test of LLMs and more entertainment and advertising of the models, and the scaffold are extensive enough and different enough from each other that they really aren't showing the models' raw capabilities (which are even worse than I complained about) or comparing them meaningfully.

[–] scruiser@awful.systems 10 points 1 month ago (1 children)

Is that supposed to be an advertisement in favor of AI? (As opposed to stealth satire?) Seeing it makes me want to get off my computer and touch grass.

[–] scruiser@awful.systems 3 points 1 month ago

Wow, that is some skilled modeling. You should become a superforecaster and write ~~prophecies~~ AI timelines, they are quite popular on lesswrong.

[–] scruiser@awful.systems 10 points 1 month ago (2 children)

To elaborate on the other answers about alphaevolve. the LLM portion is only a component of alphaevolve, the LLM is the generator of random mutations in the evolutionary process. The LLM promoters like to emphasize the involvement of LLMs, but separate from the evolutionary algorithm guiding the process through repeated generations, LLM is as likely to write good code as a dose of radiation is likely to spontaneously mutate you to be able to breathe underwater.

And the evolutionary aspect requires a lot of compute, they don't specify in their whitepaper how big their population is or the number of generations, but it might be hundreds or thousands of attempted solutions repeated for dozens or hundreds of generations, so that means you are running the LLM for thousands or tens of thousands of attempted solutions and testing that code against the evaluation function everytime to generate one piece of optimized code. This isn't an approach that is remotely affordable or even feasible for software development, even if you reworked your entire software development process to something like test driven development on steroids in order to try to write enough tests to use them in the evaluation function (and you would probably get stuck on this step, because it outright isn't possible for most practical real world software).

Alphaevolve's successes are all very specific very well defined and constrained problems, finding specific algorithms as opposed to general software development

[–] scruiser@awful.systems 12 points 1 month ago

"You claim to like unions, but seem strangely hostile to police unions. Curious."

  • Turning Point USA
[–] scruiser@awful.systems 3 points 1 month ago

Yep. If you're looking for a snappy summary of this situation, this reddit comment had a nice summary. An open source LLM Pokemon harness/scaffold has 4.8k lines of python, and is missing features essential to Gemini's harness. Whereas an open source LUA script to play Pokemon is 7.2k lines, was written in 2014, and it consistently speed runs the game in under two hours.

[–] scruiser@awful.systems 4 points 1 month ago (1 children)

That's unfair.

Beaker deserves better than to get compared to a eugenicist ~~crypto~~fascist.

[–] scruiser@awful.systems 3 points 1 month ago (2 children)

Fellas it’s almost June in the year of the “agents” and frankly I don’t see shit.

LLM agents can beat Pokemon... if you give them enough customized tools and prompting that with the same number of lines of instruction you could just directly code a bot that beats Pokemon without an LLM in the first place. And you don't mind the LLM agent playing much much worse than literal children.

[–] scruiser@awful.systems 6 points 1 month ago* (last edited 1 month ago)

Yeah I pretty much agree. Penrose compares favorably to other cases of noble disease because the bar is so low (the Wikipedia page has got examples of racism, eugenics, homeopathy, astrology), not because his ideas about Quantum consciousness are actually good. It's not good to cite Penrose as someone notable who disagrees with the possibility of AGI because the reason he disagree is because he believes in Quantum mysticism and misunderstands Godel’s theorem and computer science.

[–] scruiser@awful.systems 7 points 1 month ago (3 children)

Yeah it's really not productive to engage directly.

I'd almost categorize Penrose as a borderline case of noble disease himself for stuff he's said about Quantum Consciousness and relatedly the halting problem and Godel's incompleteness theorem. But he actually has a proposed mechanism (involving microtubules) that is testable and falsifiable and the physics half of what he is talking about is within his domain of expertise.

[–] scruiser@awful.systems 8 points 1 month ago* (last edited 1 month ago) (5 children)

Stephen Hawking was starting to promote AI doomerism in 2014. But he's not a Nobel prize winner. Yoshua Bengio is a doomer, but no Nobel prize either, although he is pretty decorated in awards. So yeah looks like one winner and a few other notable doomers that aren't actually Nobel Prize winners somehow became winners plural in Scott's argument from authority. Also, considering the long list of example of Noble Disease, I really don't think Nobel Prize winner endorsement is a good way to gauge experts' attitudes or sentiment.

[–] scruiser@awful.systems 10 points 1 month ago (2 children)

He claims he was explaining what others believe not what he believes, but if that is so, why are you so aggressively defending the stance?

Literally the only difference between Scott's beliefs and AI:2027 as a whole is his ~~prophecy~~ estimate is a year or two later. (I bet he'll be playing up that difference as AI 2027 fails to happen in 2027, then also doesn't happen in 2028.)

Elsewhere in the thread he whines to the mods that the original poster is spamming every subreddit vaguely lesswrong or EA related with engagement bait. That poster is katxwoods... as in Kat Woods... as in a member of Nonlinear, the EA "organization" whose idea of philanthropic research was nonstop exotic vacations around the world. And, iirc, they are most infamous among us sneerer for "hiring" an underpaid (really underpaid, like couldn't afford basic necessities) intern they also used as a 24/7 live-in errand girl, drug runner, and sexual servant.

view more: ‹ prev next ›