this post was submitted on 11 Jan 2024

233 points (100.0% liked)

Technology

38933 readers

271 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

233

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 1 year ago by sculd@beehaw.org to c/technology@beehaw.org

114 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

you are viewing a single comment's thread
view the rest of the comments

[–] lily33@lemm.ee 30 points 1 year ago (3 children)

            
              This is not REALLY about copyright - this is an attack on free and open AI models, which would be IMPOSSIBLE if copyright was extended to cover the case of using the works for training.
It's not stealing. There is literally no resemblance between the training works and the model. IP rights have been continuously strengthened due to lobbying over the last century and are already absurdly strong, I don't understand why people on here want so much to strengthen them ever further.

[–] MNByChoice@midwest.social 13 points 1 year ago (1 children)

I don’t understand why people on here want so much to strengthen them ever further.

It is about a lawless company doing lawless things. Some of us want companies to follow the spirit, or at least the letter, of the law. We can change the law, but we need to discuss that.

[–] explodicle@local106.com 4 points 1 year ago (1 children)

IANAL, why isn't it fair use?

[–] maynarkh@feddit.nl 9 points 1 year ago (1 children)

The two big arguments are:

Substantial reproduction of the original work, you can get back substantial portions of the original work from an AI model's output.
The AI model replaces the use of the original work. In short, a work that uses copyrighted material under fair use can't be a replacement for the initial work.

[–] intensely_human@lemm.ee 1 points 1 year ago (1 children)

you can get back substantial portions of the original work from an AI model's output

Have you confirmed this yourself?

[–] chaos@beehaw.org 5 points 1 year ago (1 children)

In its complaint, The New York Times alleges that because the AI tools have been trained on its content, they sometimes provide verbatim copies of sections of Times reports.

OpenAI said in its response Monday that so-called “regurgitation” is a “rare bug,” the occurrence of which it is working to reduce.

“We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” OpenAI said.

The tech company also accused The Times of “intentionally” manipulating ChatGPT or cherry-picking the copycat examples it detailed in its complaint.

https://www.cnn.com/2024/01/08/tech/openai-responds-new-york-times-copyright-lawsuit/index.html

The thing is, it doesn't really matter if you have to "manipulate" ChatGPT into spitting out training material word-for-word, the fact that it's possible at all is proof that, intentionally or not, that material has been encoded into the model itself. That might still be fair use, but it's a lot weaker than the original argument, which was that nothing of the original material really remains after training, it's all synthesized and blended with everything else to create something entirely new that doesn't replicate the original.

[–] intensely_human@lemm.ee 1 points 1 year ago

So that’s a no? Confirming it yourself here means doing it yourself. Have you gotten it to regurgitate a copyrighted work?

[–] sculd@beehaw.org 12 points 1 year ago (1 children)

Sorry AIs are not humans. Also executives like Altman are literally being paid millions to steal creator's work.

[–] lily33@lemm.ee 7 points 1 year ago (1 children)

I didn't say anything about AIs being humans.

[–] intensely_human@lemm.ee 3 points 1 year ago

They’re also not vegetables 😡

[–] chahk@beehaw.org 11 points 1 year ago (1 children)

Agreed on both counts.. Except Microsoft sings a different tune when their software is being "stolen" in the exact same way. They want to have it both ways - calling us pirates when we copy their software, but it's "without merit" when they do it. Fuck'em! Let them play by the same rules they want everyone else to play.

[–] intensely_human@lemm.ee 1 points 1 year ago (1 children)

That sounds bad. Do you have evidence for MS behaving this way?

[–] chahk@beehaw.org 6 points 1 year ago

https://www.computerworld.com/article/3121736/microsoft-sues-repeat-software-pirate-who-owes-company-12m-from-prior-case.html

Literally first hit on google (after the NYT links).