this post was submitted on 11 Jan 2024

233 points (100.0% liked)

Technology

38888 readers

259 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

233

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 1 year ago by sculd@beehaw.org to c/technology@beehaw.org

114 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

top 50 comments

sorted by: hot top controversial new old

[–] noorbeast@lemmy.zip 52 points 1 year ago* (last edited 1 year ago) (4 children)

I will repeat what I have proffered before:

If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.

Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.

Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.

To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.

[–] sculd@beehaw.org 34 points 1 year ago (1 children)

Agreed.

There is nothing "fair" about the way Open AI steals other people's work. ChatGPT is being monetized all over the world and the large number of people whose work has not been compensated will never see a cent of that money.

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

Tech bros are disgusting.

[–] Omega_Haxors@lemmy.ml 12 points 1 year ago (1 children)

Tech bros are disgusting.

That's not even getting into the fraternity behavior at work, hyper-reactionary politics and, er, concerning age preferences.

[–] sculd@beehaw.org 8 points 1 year ago

Yup. I said it in another discussion before but think its relevant here.

Tech bros are more dangerous than Russian oligarchs. Oligarchs understand the people hate them so they mostly stay low and enjoy their money.

Tech bros think they are the savior of the world while destroying millions of people's livelihood, as well as destroying democracy with their right wing libertarian politics.

[–] TheFreezinSteven@beehaw.org 9 points 1 year ago* (last edited 1 year ago) (1 children)

With your logic all artists will have to pay copyright fees just to learn how to draw. All musicians will have to pay copyright fees just to learn their instrument.

I guess I should clarify by saying I'm a professional musician.

[–] chahk@beehaw.org 12 points 1 year ago* (last edited 1 year ago) (1 children)

Do musicians not buy the music that they want to listen to? Should they be allowed to torrent any MP3 they want just because they say it's for their instrument learning?

I mean I'd be all for it, but that's not what these very same corporations (including Microsoft when it comes to software) wanted back during Napster times. Now they want a separate set of rules just for themselves. No! They get to follow the same laws they force down our throats.

load more comments (1 replies)

load more comments (2 replies)

[–] Nacktmull@lemm.ee 44 points 1 year ago

The problem is not the use of copyrighted material. The problem is doing so without permission and without paying for it.

[–] sub_@beehaw.org 37 points 1 year ago* (last edited 1 year ago) (2 children)

https://petapixel.com/2024/01/03/court-docs-reveal-midjourney-wanted-to-copy-the-style-of-these-photographers/

What's stopping AI companies from paying royalties to artists they ripped off?

Also, lol at accounts created within few hours just to reply in this thread.

The moment their works are the one that got stolen by big companies and driven out of business, watch their tune change.

Edit: I remember when Reddit did that shitshow, and all the sudden a lot of sock / bot accounts appeared. I wasn't expecting it to happen here, but I guess election cycle is near.

[–] furrowsofar@beehaw.org 14 points 1 year ago (8 children)

Money is not always the issue. FOSS software for example. Who wants their FOSS software gobbled up by a commercial AI regardless. So there are a variety of issues.

load more comments (8 replies)

[–] sanzky@beehaw.org 9 points 1 year ago* (last edited 1 year ago) (1 children)

What’s stopping AI companies from paying royalties to artists they ripped off?

profit. AI is not even a profitable business now. They exist because of the huge amount of investment being poured into it. If they have to pay their fair share they would not exist as a business.

what OpenAI says is actually true. The issue IMHO is the idea that we should give them a pass to do it.

[–] sub_@beehaw.org 11 points 1 year ago (1 children)

Uber wasn't making profit anyway, despite all the VCs money behind it.

I guess they have reasons not to pay drivers properly. Give Uber a free pass for it too

load more comments (1 replies)

[–] sculd@beehaw.org 32 points 1 year ago

Some relevant comments from Ars:

leighno5

The absolute hubris required for OpenAI here to come right out and say, 'Yeah, we have no choice but to build our product off the exploitation of the work others have already performed' is stunning. It's about as perfect a representation of the tech bro mindset that there can ever be. They didn't even try to approach content creators in order to do this, they just took what they needed because they wanted to. I really don't think it's hyperbolic to compare this to modern day colonization, or worker exploitation. 'You've been working pretty hard for a very long time to create and host content, pay for the development of that content, and build your business off of that, but we need it to make money for this thing we're building, so we're just going to fucking take it and do what we need to do.'

The entitlement is just...it's incredible.

4qu4rius

20 years ago, high school kids were sued for millions & years in jail for downloading a single Metalica album (if I remember correctly minimum damage in the US was something like 500k$ per song).

All of a sudden, just because they are the dominant ones doing the infringment, they should be allowed to scrap the entire (digital) human knowledge ? Funny (or not) how the law always benefits the rich.

[–] lily33@lemm.ee 30 points 1 year ago (3 children)

This is not REALLY about copyright - this is an attack on free and open AI models, which would be IMPOSSIBLE if copyright was extended to cover the case of using the works for training.
It's not stealing. There is literally no resemblance between the training works and the model. IP rights have been continuously strengthened due to lobbying over the last century and are already absurdly strong, I don't understand why people on here want so much to strengthen them ever further.

[–] MNByChoice@midwest.social 13 points 1 year ago (5 children)

I don’t understand why people on here want so much to strengthen them ever further.

It is about a lawless company doing lawless things. Some of us want companies to follow the spirit, or at least the letter, of the law. We can change the law, but we need to discuss that.

load more comments (5 replies)

[–] sculd@beehaw.org 12 points 1 year ago (1 children)

Sorry AIs are not humans. Also executives like Altman are literally being paid millions to steal creator's work.

[–] lily33@lemm.ee 7 points 1 year ago (1 children)

I didn't say anything about AIs being humans.

load more comments (1 replies)

[–] chahk@beehaw.org 11 points 1 year ago (2 children)

Agreed on both counts.. Except Microsoft sings a different tune when their software is being "stolen" in the exact same way. They want to have it both ways - calling us pirates when we copy their software, but it's "without merit" when they do it. Fuck'em! Let them play by the same rules they want everyone else to play.

load more comments (2 replies)

[–] SilentStorms@lemmy.dbzer0.com 27 points 1 year ago (5 children)

It's crazy how everyone is suddenly in favour of IP law.

[–] t3rmit3@beehaw.org 19 points 1 year ago* (last edited 1 year ago) (3 children)

IP law used to stop corporations from profiting off of creators' labor without compensation? Yeah, absolutely.

IP law used to stop individuals from consuming media where purchases wouldn't even go to the creators, but some megacorp? Fuck that.

I'm against downloading movies by indie filmmakers without compensating them. I'm not against downloading films from Universal and Sony.

I'm against stealing food from someone's garden. I'm not against stealing food from Safeway.

If you stop looking at corporations as being the same as individuals, it's a very simple and consistent viewpoint.

IP law shouldn't exist, but if it does it should only exist to protect individuals from corporations. When that's how it's being used, like here, I accept it as a necessary evil.

load more comments (3 replies)

[–] mnglw@beehaw.org 17 points 1 year ago* (last edited 1 year ago)

I'm not so much in favor of IP law as I am in favor of informed consent in every aspect of the word.

when posting photos, art and text content years ago, I was not able to imagine it might be trained off by an AI. As such I was not able to make a decision based on informed consent if I agreed to that or not.

Even though quotes such as "once you post it, its on the internet forever" were around, I was not aware the extend to which this reached and that had my art been vacuumed by a generative AI model (it hasnt luckily) people could create art that pretends to be created by me. Thus I could not consent

I think this goes for a lot of artists actually, especially those who exist far more publicly than I do, who are in those databases and who are a keyword to be used in prompts. There is no possible way they could have given informed consent to that at the time they posted art/at the time they started that social media profile/youtube channel etc.

To me, this is the real problem. I could care less about corporations.

[–] interdimensionalmeme@lemmy.ml 10 points 1 year ago

I still think IP needs to eat shit and die. Always has, always will.

I recently found out we could have had 3d printing 20 years earlier but patents stopped that. Cocks !

load more comments (2 replies)

[–] explodicle@local106.com 27 points 1 year ago (4 children)

Having read through these comments, I wonder if we've reached the logical conclusion of copyright itself.

[–] sanzky@beehaw.org 27 points 1 year ago

copyright has become a tool of oppression. Individual author's copyright is constantly being violated with little resources for them to fight while big tech abuses others work and big media uses theirs to the point of it being censorship.

[–] frog@beehaw.org 20 points 1 year ago (2 children)

Perhaps a fair compromise would be doing away with copyright in its entirety, from the tiny artists trying to protect their artwork all the way up to Disney, no exceptions. Basically, either every creator has to be protected, or none of them should be.

[–] zaphod@lemmy.ca 15 points 1 year ago* (last edited 1 year ago) (3 children)

IMO the right compromise is to return copyright to its original 14 year term. OpenAI can freely train on anything up to 2009 which is still a gigantic amount of material while artists continue to be protected and incentivized.

load more comments (3 replies)

load more comments (1 replies)

load more comments (2 replies)

[–] Powderhorn@beehaw.org 24 points 1 year ago (2 children)

Any reasonable person can reach the conclusion that something is wrong here.

What I'm not seeing a lot of acknowledgement of is who really gets hurt by copyright infringement under the current U.S. scheme. (The quote is obviously directed toward the UK, but I'm reasonably certain a similar situation exists there.)

Hint: It's rarely the creators, who usually get paid once while their work continues to make money for others.

Let's say the New York Times wins its lawsuit. Do you really think the reporters who wrote the infringed-upon material will be getting royalty checks to be made whole?

This is not OpenAI vs creatives. OK, on a basic level it is, but expecting no one to scrape blogs and forum posts rather goes against the idea of the open internet in the first place. We've all learned by now that what goes on the internet stays there, with attribution totally optional unless you have a legal department. What's novel here is the scale of scraping, but I see some merit to the "transformational" fair-use defense given that the ingested content is not being reposted verbatim.

This is corporations vs corporations. Framing it as millions of people missing out on what they'd have otherwise rightfully gotten is disingenuous.

[–] lemmyvore@feddit.nl 17 points 1 year ago* (last edited 1 year ago) (1 children)

This isn't about scraping the internet. The internet is full of crap and the LLMs will add even more crap to it. It will shortly become exponentially harder to find the meaningful content on the internet.

No, this is about dipping into high quality, curated content. OpenAI wants to be able to use all existing human artwork without paying anything for it, and then flood the world with cheap knockoff copies. It's that simple.

[–] towerful@programming.dev 10 points 1 year ago (1 children)

Shortly? It's happening already. I notice it when using Google and Duckduckgo. There are always a few hits that are AI written blog spam word soup

[–] lemmyvore@feddit.nl 9 points 1 year ago (2 children)

Unfortunately you haven't seen the full impact of LLMs yet. What you're seeing now is stuff that's already been going on for a decade. SEO content generators have been a thing for many years and used by everybody from small business owners to site chains pinching ad pennies.

When the LLM crap will kick in you won't see anything except their links. I wouldn't be surprised if we'll have to go back to 90s tech and use human-curated webrings and directories.

load more comments (2 replies)

load more comments (1 replies)

[–] casmael@startrek.website 23 points 1 year ago (2 children)

Well in that case maybe chat gpt should just fuck off it doesn’t seem to be doing anything particularly useful, and now it’s creator has admitted it doesn’t work without stealing things to feed it. Un fucking believable. Hacks gonna hack I guess.

load more comments (2 replies)

[–] drwho@beehaw.org 17 points 1 year ago

As with many things, the golden rule applies. They who have the gold, make the rules.

[–] KingThrillgore@lemmy.ml 16 points 1 year ago

...so stop doing it!

This explains what Valve was until recently not so cavalier about AI: They didn't want to hold the bag on copyright matters outside of their domain.

[–] jlow@beehaw.org 15 points 1 year ago (1 children)

It's also "impossible" to have multiple terabytes of media on my homeserver without copyright infringement, so piracy is ok, right!?

O no, wait it actually is possible, it's just more expensive and more work to do it legally (and leaves a lot of plastic trash in form of Blurays and DVDs), just like with AI. But laws are just for poor people, I guess.

load more comments (1 replies)

[–] fckreddit@lemmy.ml 15 points 1 year ago

Then shutdown your goddamn company until you find a better way.

[–] Pratai@lemmy.ca 14 points 1 year ago* (last edited 1 year ago) (2 children)

I stand by my opinion that AI will be the worst thing humans ever created, and that means it ranks just a bit above religion.

[–] sculd@beehaw.org 7 points 1 year ago

This is very likely to be true.

load more comments (1 replies)

[–] qyron@sopuli.xyz 13 points 1 year ago (5 children)

If it is impossible, either shut down operations or find a way to pay for it.

load more comments (5 replies)

[–] FracturedPelvis@lemmy.ml 11 points 1 year ago (1 children)

The real issue is money. How much and how (un)distributed.

Why is it fair/ok that one company can use all this material and make a lot of money off it without paying or even acknowledging others work?

On the flip side AI model could be useful. Maybe the models/weights should be made free just like the content they are trained on. Instead of paying for the model, we should pay for the hosting of the inference (aka. the API)

load more comments (1 replies)

[–] vexikron@lemmy.zip 11 points 1 year ago* (last edited 1 year ago) (3 children)

Or, or, or, hear me out:

Maybe their particular approach to making an AI is flawed.

Its like people do not know that there are many different kinds of ways that attempt to do AI.

Many of them do not rely on basically a training set that is the cumulative sum of all human generated content of every imaginable kind.

load more comments (3 replies)

[–] Kolanaki@yiffit.net 11 points 1 year ago* (last edited 1 year ago)

Then pay for the material like everyone else who can't do things without someone else's copyrighted materials.

[–] ky56@aussie.zone 10 points 1 year ago (1 children)

All the AI race has done is surface the long standing issue of how broken copyright is for the online internet era. Artists should be compensated but trying to do that using the traditional model which was originally designed with physical, non infinitely copyable goods in mind is just asinine.

One such model could be to make the copyright owner automatically assigned by first upload on any platform that supports the API. An API provided and enforced by the US copyright office. A percentage of the end use case can be paid back as royalties. I haven't really thought out this model much further than this.

Machine learning is here to say and is a useful tool that can be used for good and evil things alike.

[–] Kichae@lemmy.ca 9 points 1 year ago (1 children)

Nah. Copyright is broken, but it's broken because it lasts too long, and it can be held by constructs. People should still reserve the right to not have the things they've made incorporated into projects or products they don't want to be associated with.

The right to refusal is important. Consent is important. The default permission should not be shifted to "yes" in anybody's mind.

The fact that a not insignificant number of people seem to think the only issue here is money points to some pretty fucking entitled views among the would-be-billionaires.

load more comments (1 replies)

[–] onlinepersona@programming.dev 10 points 1 year ago (1 children)

Wait, so if the way I make money is illegal now, it's the system's fault, isn't it? That means I can keep going because I believe I'm justified, right? Right?

CC BY-NC-SA 4.0

load more comments (1 replies)

[–] furrowsofar@beehaw.org 8 points 1 year ago

Of course it is. About 50 years ago we went to a regime where everything is copywrited rather then just things that were marked and registered. Not sure where.I stand on that. One could argue we are in a crazy over copyright era now anyway.

load more comments