this post was submitted on 23 Jan 2024
45 points (66.2% liked)

Technology

59626 readers
2702 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

I fucked with the title a bit. What i linked to was actually a mastodon post linking to an actual thing. but in my defense, i found it because cory doctorow boosted it, so, in a way, i am providing the original source here.

please argue. please do not remove.

top 50 comments
sorted by: hot top controversial new old
[–] charonn0@startrek.website 58 points 10 months ago (3 children)

I think we should have a rule that says if a LLM company invokes fair use on the training inputs then the outputs are public domain.

[–] Steve@communick.news 26 points 10 months ago* (last edited 10 months ago) (1 children)

That's already been ruled on once.

A recent lawsuit challenged the human-authorship requirement in the context of works purportedly “authored” by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying his application to register a visual artwork that he claims was authored “autonomously” by an AI program called the Creativity Machine. Dr. Thaler argued that human authorship is not required by the Copyright Act. On August 18, 2023, a federal district court granted summary judgment in favor of the Copyright Office. The court held that “human authorship is an essential part of a valid copyright claim,” reasoning that only human authors need copyright as an incentive to create works. Dr. Thaler has stated that he plans to appeal the decision.

Why would companies care about copyright of the output? The value is in the tool to create it. The whole issue to me revolves around the AI company profiting on it's service. A service built on a massive library of copyrighted works. It seems clear to me, a large portion of their revenue should go equally to the owners of the works in their database.

[–] Even_Adder@lemmy.dbzer0.com 11 points 10 months ago (1 children)

You can still copyright AI works, you just can't name an AI as the author.

[–] Steve@communick.news 9 points 10 months ago (1 children)

That's just saying you can claim copyright if you lie about authorship. The problem then is, you may step into the realm of fraud.

[–] Even_Adder@lemmy.dbzer0.com 8 points 10 months ago

You don't have to lie about authorship. You should read the guidance.

load more comments (2 replies)
[–] NevermindNoMind@lemmy.world 33 points 10 months ago (1 children)

Google scanned millions of books and made them available online. Courts ruled that was fair use because the purpose and interface didn't lend itself to actually reading the books in Google books, but just searching them for information. If that is fair use, then I don't see how training an LLM (which doesn't retain the exact copy of the training data at least in the vast majority of cases) isn't fair use. You aren't going to get an argument from me.

I think most people who will disagree are reflexively anti AI, and that's fine. But I just haven't heard a good argument that AI training isn't fair use.

[–] commie@lemmy.dbzer0.com 5 points 10 months ago (1 children)

here's a sidechannel attack on your position: every use, even infringing uses, are fair use until adjudicated, because what fair use means is that a court has agreed that your infringing use is allowed. so of course ai training (broadly) is always fair use. but particular instances of ai training may be found to not be fair use, and so we can't be sure that you are always going to be right (for the specific ai models that may come into question legally).

[–] semperverus@lemmy.world 9 points 10 months ago (1 children)

"Its perfectly legal unless you get caught!"

load more comments (1 replies)
[–] yuki2501@lemmy.world 19 points 10 months ago (7 children)

What constitutes fair use?

17 U.S.C. § 107

Notwithstanding the provisions of sections 17 U.S.C. § 106 and 17 U.S.C. § 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.

GenAI training, at least regarding art, is neither criticism, comment, news reporting scholarship, nor research.

AI training is not done by scientists but engineers of a corporative entity with a long term profit goal.

So, by elimination, we can conclude that none of the purposes covered by the fair use doctrine apply to Generative AI training.

Q.E.D.

[–] General_Effort@lemmy.world 7 points 10 months ago (1 children)

"Such as" means that these are examples and not an exhaustive list.

Can you explain how the 3 factors you listed rule out scholarship or research purpose? Regarding the first factor, how do you determine that AI developers are all engineers and never computer scientists?

[–] TheFriar@lemm.ee 4 points 10 months ago (4 children)

I’d argue that the community benefit aspect of the “scholarship or research purposes”language preclude for-profit AI companies from falling under fair use. These aren’t education programs. They’re not research for the greater good. They are private entities trying to create a machine that can copy until it creates. For their own needs, not the greater good. Education has a net positive effect on society, and those stipulations in the law are meant to better serve the whole.

If these generative AI machines were being built by students, it would fall under these specifications of fair use. But the profit motive changes everything.

I’d say “fair use” pretty much covers educational and community benefit. Private companies do neither. They are stealing and reproducing for themselves, not society.

load more comments (4 replies)
[–] toast@retrolemmy.com 4 points 10 months ago (5 children)

You skipped right over "teaching".

Why is that?

load more comments (5 replies)
load more comments (5 replies)
[–] Even_Adder@lemmy.dbzer0.com 5 points 10 months ago
[–] cyd@lemmy.world 2 points 10 months ago (3 children)

Agreed. I would also argue that trained model weights are not copyrightable.

load more comments (3 replies)
load more comments
view more: next ›