this post was submitted on 13 Jul 2024
49 points (100.0% liked)
TechTakes
1427 readers
106 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This is a weird kind of assertion. First of all. You could make facts a token value in an LLM if you had some pre-calculated truth value for your data set. That's not how it works now but it's a weird assertion to make about an unknown new generation of AI. As the author points out, facts kind of are a data type, it's just that the AI considers the most related words to the prompt to be the most correct, which of course, with a bad data set they are not.
Also, the current generation of ai, as admitted by the company, is not meant to be a tool for finding facts. It's a tool for generation, yes, a bit like an auto-complete but for natural language and with a much much wider scope.
What Strawberry apparently is, is a machine that reasons, which is NOT similar to what Open-AI ever claimed ChatGPT ever was. It's like a guy promised to bring a new animal to the village that will be able to pull the plow and the author is saying "this guy's full of shit! We have cats all over the village and even the biggest one could never pull a plow! They aren't designed for it! All animals are good for is catching mice!" And the guy brings in an Ox.
Edit: honestly my opinion of AI is lukewarm. I'm with a lot of people that the hype of it now being integrated into all sorts of nonsense is stupid. Its just that all of the bad arguments against it makes me tired.
An extra bit of labeling on your training data set really doesn't help you that much. LLMs already make up plausible looking citations and website links (and other data types) that are actually complete garbage even though their training data has valid citations and website links (and other data types). Labeling things as "fact" and forcing the LLM to output stuff with that "fact" label will get you output that looks (in terms of statistical structure) like valid labeled "facts" but have absolutely no guarantee of being true.