this post was submitted on 08 Nov 2024
67 points (94.7% liked)

Technology

60105 readers
2108 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

OpenAI on Thursday won the dismissal of a federal copyright suit brought by digital news websites Raw Story and AlterNet, with a New York judge finding the outlets failed to identify an appropriate injury from the claimed copyright infringement.

you are viewing a single comment's thread
view the rest of the comments
[–] Vanth@reddthat.com 32 points 1 month ago (1 children)

Ever seen Superman III where Richard Pryor's character realizes there's rounding in the numbers of his company's payroll, taxes, etc, so he writes a program to skim those partial cents into his account and he ends up shocked at the amount? And he knew it was illegal and everyone who learned of it knew it was illegal?

Companies profiting off AI trained on stolen material feels like that, but if the cops who discovered Pryor's crime were to say "eh, partial cents aren't a real thing anyway. No harm, no foul" and let Pryor keep skimming.

[–] riskable@programming.dev 13 points 1 month ago (2 children)

Except there's nothing illegal about scraping all the content from websites (including news sites) and putting it into your own personal database. That is--after all--how search engines work.

It's only illegal if you then distribute said copyrighted material without the copyright owner's permission. Because that's what copyright is all about: Distribution.

The news sites distributing the content in this case freely gave it to OpenAI's crawlers. It's not like they broke into these organizations in order to copy their databases of news articles.

For the news sites to have a case they need to demonstrate that OpenAI is creating a "derivative work" using their copyrighted material. However, that's going to be a tough sell to judges and/or juries since the way LLMs work is not so different from how humans do: They take in information and then produce similar information (by predicting the next word/symbol, given a series of tokens/a prompt).

If you read all of Stephen King's books, for example, you might be better at writing horror stories. You may even start writing in a similar style! That doesn't mean you're violating his copyright by producing similar stories.

[–] reddig33@lemmy.world 5 points 1 month ago (1 children)

They will need to show plagiarism in the results returned by AI. I bet that won’t be too difficult.

[–] riskable@programming.dev 10 points 1 month ago

It will be difficult because the AI only returns short results (relatively speaking). A sentence or two does not make for copyright infringement.

Yup. These things have been carefully crafted to dodge these legal issues. It will not be easy to pin them down.