this post was submitted on 10 Apr 2024
1276 points (99.0% liked)
Programmer Humor
19594 readers
859 users here now
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Rules
- Keep content in english
- No advertisements
- Posts must be related to programming or programmer topics
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The fallout of image generation will be even more incredible imo. Even if models do become even more capable, training off of post-'21 data will become increasingly polluted and difficult to distinguish as models improve their output, which inevitably leads to model collapse. At least until we have a standardized way of flagging generated images opposed to real ones, but I don't really like that future.
Just on a tangent, openai claiming video models will help "AGI" understand the world around it is laughable to me. 3blue1brown released a very informative video on how text transformers work, and in principal all "AI" is at the moment is very clever statistics and lots of matrix multiplication. How our minds process and retain information is by far more complicated, as we don't fully understand ourselves yet and we are a grand leap away from ever emulating a true mind.
All that to say is I can't wait for people to realize: oh hey that is just to try to replace talent in film production coming from silicon valley
Yeah I read one of the papers that talked about this. Essentially putting AGI data into a training set will pollute it, and cause it to just fall apart. Most LLMs especially are going to be a ton of fun as there were absolutely no rules about what to do, and bots and spammers immediately used it everywhere on the internet. And the only solution is to.... write a model to detect it. Which then they'll make models that bypass that, and there will just be no way to keep the dataset clean.
The hype of AI is warranted - but also way overblown. Hype from actual developers and seeing what it can do when it's tasked with doing something appropriate? Blown away. Just honestly blown away. However hearing what businesses want to do with it, the crazy shit like "We'll fire everyone and just let AI do it!" Impossible. At least with the current generation of models. Those people remind me of the crypto bros saying it's going to revolutionize everything. It might, but you need to actually understand the tech and it's limitations first.
Building my own training set is something I would certainly want to do eventually. Ive been messing with Mistral Instruct using GPT4ALL and its genuinely impressive how quick my 2060 can hallucinate relatively accurate information, but its also evident of limitations. IE I tell it I do not want to use AWS or another cloud hosting service, it will just return a list of suggested services not including AWS. Most certainly a limit of its training data but still impressive.
Anyone suggesting to use LLMs to manage people or resources are better off flipping a coin on every thought, more than likely companies who are insistent on it will go belly up soon enough
You're describing an arms race, which makes me wonder if that's part of the path to AGI. Ultimately the only way to truly detect a fake is to compare it to reality, and the only way to train a model to understand whether it is looking at reality or a generated image is to teach it to understand context and meaning, and that's basically the ballgame at that point. That's a qualitative shift, and in that scenario we get there with opposing groups each pursuing their own ends, not with a single group intentionally making AGI.
It's definitely a qualitative shift. I suspect most of the fundamental maths of neural network matrices won't need to change, because they are enough to emulate the lower level functions of our brains. We have dedicated parts of our brain for image recognition, face recognition, language interpretation, and so on, very analogous to the way individual NNs do those same functions. We got this far with biomimicry, and it's fascinating to me that biomimicry on the micro level is naturally turning into biomimicry on a larger scale. It seems reasonable to believe that process will continue.
Perhaps some subtle tuning of those matrices is needed to really replicate a mind, but I suspect the actual leap will require first of all a massive increase in raw computation, as well as some new insight into how to arrange all of those subsystems within a larger structure.
What I find interesting is the question of whether AI can actually fully replace a person in a job without crossing that threshold and becoming AGI, and I genuinely don't think it can. Sure it'll be able to automate some very limited tasks, but without the capacity to understand meaning it can't ever do real problem solving. I think past that point it has to be considered a person with all of the ethical implications that has, and I think tech bros intentionally avoid acknowledging that, because that would scare investors.
I'm sure it would be pretty simple to put a simple code in the pixels of the image, could probably be done with offset of alpha channel or whatever, using relative offsets or something like that. I might be dumb but fingerprinting the actual image should be relatively quick forward and an algorithm could be used to detect it, of course it would potentially be damaged by bad encoding or image manipulation that changes the entire image. but most people are just going to be copy and pasting and any sort of error correction and duplication of the code would preserve most of the fingerprint.
I'm a dumb though and I'm sure there is someone smarter than me who actually does this sort of thing who will read this and either get angry at the audacity or laugh at the incompetence.
I see this a lot, but do you really think the big players haven't backed up the pre-22 datasets? Also, synthetic (LLM generated) data is routinely used in fine tuning to good effect, it's likely that architectures exist that can happily do primary training on synthetic as well.
AIs can be trained to detect AI generated images, so then the race is only whether the AI produced images get better faster than the detector can keep up or not.
More likely as the technology evolves AIs, like a human, will just train real-time-ish from video taken from it's camera eyeballs.
...and then, of course, it will KILL ALL HUMANS.