this post was submitted on 26 Jul 2023
847 points (96.5% liked)
Technology
59261 readers
2695 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
There's a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
All AI creations are derivative and subject to copyright law.
But the AI is looking at thousands, if not millions of books, articles, comments, etc. That's what humans do as well - they draw inspiration from a variety of sources. So is sentience the distinguishing criteria for copyright? Only a being capable of original thought can create original work, and therefore anything not capable of original thought cannot create copyrighted work?
Also, irrelevant here but calling LLMs a glorified autocomplete is like calling jet engines a "glorified horse". Technically true but you're trivialising it.
Yes. Creative work is made by creative people. Writing is creative work. A computer cannot be creative, and thus generative AI is a disgusting perversion of what you wanna call “literature”. Fuck, writing and art have always been primarily about self-expression. Computers can’t express themselves with original thoughts. That’s the whole entire point. And this is why humanistic studies are important, by the way.
I absolutely agree with the second half, guided by Ian Kerr's paper "Death of the AI Author"; quoting from the abstract:
I think the part courts will struggle with is if this 'thing' is not an author of the works then it can't infringe either?
The trivialization doesn't negate the point though, and LLMs aren't intelligence.
The AI consumed all of that content and I would bet that not a single of the people who created the content were compensated, but the AI strictly on those people to produce anything coherent.
I would argue that yes, generative artificial stupidity doesn't meet the minimum bar of original thought necessary to create a standard copyrightable work unless every input has consent to be used, and laundering content through multiple generations of an LLM or through multiple distinct LLMs should not impact the need for consent.
Without full consent, it's just a massive loophole for those with money to exploit the hard work of the masses who generated all of the actual content.
The thing is these models aren't aiming to re-create the work of any single authors, but merely to put words in the right order. Imo, If we allow authors to copyright the order of their words instead of their whole original creations then we are actually reducing the threshold for copyright protection and (again imo) increasing the number of acts that would be determined to be copyright protected
But for text to be a derivative work of other text, you need to be able to know by looking at the two texts and comparing them.
Training an AI on a copyrighted work might necessarily involve making copies of the work that would be illegal to make without a license. But the output of the AI model is only going to be a for-copyright-purposes derivative work of any of the training inputs when it actually looks like one.
Did the AI regurgitate your book? Derivative work.
Did the AI spit out text that isn't particularly similar to any existing book? Which, if written by a human, would have qualified as original? Then it can't be a derivative work. It might not itself be a copyrightable product of authorship, having no real author, but it can't be secretly a derivative work in a way not detectable from the text itself.
Otherwise we open ourselves up to all sorts of claims along the lines of "That book looks original, but actually it is a derivative work of my book because I say the author actually used an AI model trained on my book to make it! Now I need to subpoena everything they ever did to try and find evidence of this having happened!"