TechTakes

1996 readers

49 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

Meta beats Kadrey, AI training was fair use — what this means (pivot-to-ai.com)

submitted 2 days ago by dgerard@awful.systems to c/techtakes@awful.systems

5 comments fedilink hide all child comments

podcast version
video version

top 5 comments

sorted by: hot top controversial new old

[–] corbin@awful.systems 1 points 1 hour ago

Read carefully. On p1-2, the judge makes it clear that "the incentive for human beings to create artistic and scientific works" is "the ability of copyright holders to make money from their works," to the law, there isn't any other reason to publish art. This is why I'm so dour on copyright, folks; it's not for you who love to make art and prize it for its cultural impact and expressive power, but for folks who want to trade art for money.

On p3, a contrast appears between Chhabria and Alsup (yes, that Alsup); the latter knows what a computer is and how to program it, and this makes him less respectful of copyright overall. Chhabria doesn't really hide that they think Meta didn't earn their summary judgement, presumably because they disagree with Alsup about whether this is a "competitive or creative displacement." That's fair given the central pillar of the decision on p4:

Llama is not capable of generating enough text from the plantiffs' books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data.

An analogy might make this clearer. Suppose a transient person on a street corner is babbling. Occasionally they spout what sounds like a quote from a Star Wars film. Intrigued, we prompt the transient to recite the entirety of Star Wars, and they proceed to mostly recreate the original film, complete with sound effects and voice acting, only getting a few details wrong. Does it matter whether the transient paid to watch the original film (as opposed to somebody else paying the fee)? No, their recreation might be candid and yet not faithful enough to infringe. Is Lucas entitled to a licensing fee for every time the transient happens to learn something about Star Wars? Eh, not yet, but Disney's working on it. This is why everybody is so concerned about whether the material was pirated, regardless of how it was paid for; they want to say that what's disallowed is not the babbling on the street but the access to the copyrighted material itself.

Almost every technical claim on p8-9 is simplified to the point of incorrectness. They are talking points about Transformers turned into aphorisms and then axioms. The wrongest claim is on p9, that "to be able to generate a wide range of text … an LLM's training data set must be large and diverse" (it need only be diverse, not large) followed by the claim that an LLM's "memory" must be trained on books or equivalent "especially valuable training data" in order to "work with larger amounts of text at once" (conflating hyperparameters with learned parameters.) These claims show how the judge fails to actually engage with the technical details and thus paints with a broad brush dipped in the wrong color.

On p12, the technical wrongness overflows. Any language model can be forced to replicate a copyrighted work, or to avoid replication, by sampling techniques; this is why perplexity is so important as a metric. What would have genuinely been interesting is whether Llama is low-perplexity on the copyrighted works, not the rate of exact replications, since that's the key to getting Llama to produce unlimited Harry Potter slash or whatever.

On p17 the judge ought to read up on how Shannon and Markov initially figured out information theory. LLMs read like Shannon's model, and in that sense they're just like humans: left to right, top to bottom, chunking characters into words, predicting shapes and punctuation. Pretending otherwise is powdered-wig sophistry or perhaps robophobia.

On p23 Meta cites fuckin' Sega v. Accolade! This is how I know y'all don't read the opinions; you'd be hyped too. I want to see them cite Galoob next. For those of you who don't remember the 90s, the NES and Genesis were video game consoles, and these cases established our right to emulate them and write our own games for them.

p28-36 is the judge giving free legal advice. I find their line of argumentation tenuous. Consider Minions; Minions are bad, Minions are generic, and Minions can be used to crank out infinite amounts of slop. But, as established at the top, whoever owns Minions has the right to profit from Minions, and that is the lone incentive by which they go to market. However, Minions are arbitrary; there's no reason why they should do well in the market, given how generic and bad they are. So if we accept their argument then copyright becomes an excuse for arbitrary winners to extract rent from cultural artifacts. For a serious example, look up the ironic commercialization of the Monopoly brand.

[–] diz@awful.systems 3 points 23 hours ago

So, the judge says:

In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use.

And what is that supposed to ever look like? Do authors need a better developed record of effects of movies on book sales, to get paid for movie adaptations, too?

[–] BlueMonday1984@awful.systems 11 points 2 days ago

[–] mountainriver@awful.systems 6 points 1 day ago (1 children)

I think in most EU countries - after lobbying from US copyright corporations - it is explicitly banned to make copies from an illegal original. This was in order to criminalise downloads from torrents whether you seed or not. And the potential punishment typically involves jail sentences in order to give the police access to the surveillance necessary to prove the crime. Plus copyright violations being the only crime that in all EU countries also yields punishing damages.

Now I know this because I was against every single one of these unproportional laws, but some copyright organisations over here should know this. Just saying it would be fun if Meta got to pay out punishing damages. And even funnier if Zuckerberg got some jail time.

[–] sunzu2@thebrainbin.org 1 points 2 hours ago

Just saying it would be fun if Meta got to pay out punishing damages.

It would be pretty great but we both know that's k now how this cookoe crumbles.

The oligarchs never get the dick of the law, their property is protected

Plebs pays the taxes and their property and rights is to be looted to enable oligarchs to live their best lives

Limp dick regimes enable it because corruption fucks hard

And just like that, we are a living in the future and it is a dystopia with out much of the cool tech. Just endless extraction and deteriorating socio economic conditions