this post was submitted on 28 Oct 2024
33 points (100.0% liked)
TechTakes
1427 readers
120 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The stretching is just so blatant. People who train neural networks do not write a bunch of tokens and weights. They take a corpus of training data and run a training program to generate the weights. That's why it is the training program and the corpus that should be considered the source form of the program. If either of these can't be made available in a way that allows redistribution of verbatim and modified versions, it can't be open source. Even if I have a powerful server farm and a list of data sources for Llama 3, I can't replicate the model myself without committing copyright infringement (neither could Facebook for that matter, and that's not an entirely separate issue).
There are large collections of freely licensed and public domain media that could theoretically be used to train a model, but that model surely wouldn't be as big as the proprietary ones. In some sense truly open source AI does exist and has for a long time, but that's not the exciting thing OSI is lusting after, is it?
Reading this made me think of an analogy of generated code. This is basically exactly the same thing as distributing the code of your program but not in the source language, rather the assembly listing of the final binary, and calling it open source. You can turn any defense of the AI model of "open-source" into a defense of that model of distributing code. You can run my AI/code (if you have a powerful/similar enough machine), you can inspect it (it's just not going to tell you anything), you can modify it (lol), so it's open source!
Edit: The more I think about it the more I come to the realisation that the assembly listing is actually still vastly more useful than the AI models. Like at least a very dedicated and insane enough programmer could technically track down a bug in the assembly and correct it if given enough coffee.
It's open source trust me I wrote that ELF file directly with C-x M-c M-butterfly.