this post was submitted on 10 Jan 2024

1216 points (96.6% liked)

Technology

59201 readers

2913 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

1216

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?" (fosstodon.org)

submitted 10 months ago by Star@sopuli.xyz to c/technology@lemmy.world

237 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Star@sopuli.xyz 397 points 10 months ago* (last edited 10 months ago) (12 children)

It's so ridiculous when corporations steal everyone's work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it's somehow illegal, unethical, immoral and what not.

[–] Grimy@lemmy.world 83 points 10 months ago (12 children)

Using publically available data to train isn't stealing.

Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can't use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

They want to kill the open-source scene and are manipulating you to do so. Don't build their moat for them.

[–] givesomefucks@lemmy.world 55 points 10 months ago* (last edited 10 months ago) (11 children)

And using publicly available data to train gets you a shitty chatbot...

Hell, even using copyrighted data to train isn't that great.

Like, what do you even think they're doing here for your conspiracy?

You think OpenAI is saying they should pay for the data? They're trying to use it for free.

Was this a meta joke and you had a chatbot write your comment?

[–] tourist@lemmy.world 25 points 10 months ago

Was this a meta joke and you had a chatbot write your comment?

if someone said this to me I'd cry

[–] webghost0101@sopuli.xyz 19 points 10 months ago* (last edited 10 months ago) (15 children)

The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn't copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

EDIT: In case it isn't clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

[–] RainfallSonata@lemmy.world 16 points 10 months ago (4 children)

I didn't want any of this shit. IDGAF if we don't have AI. I'm still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

[–] myslsl@lemmy.world 5 points 10 months ago

Machine learning techniques are often thought of as fancy function approximation tools (i.e. for regression and classification problems). They are tools that receive a set of values and spit out some discrete or possibly continuous prediction value.

One use case is that there are a lot of really hard+important problems within CS that we can't solve efficiently exactly (lookup TSP, SOP, SAT and so on) but that we can solve using heuristics or approximations in reasonable time. Often the accuracy of the heuristic even determines the efficiency of our solution.

Additionally, sometimes we want predictions for other reasons. For example, software that relies on user preference, that predicts home values, that predicts the safety of an engineering plan, that predicts the likelihood that a person has cancer, that predicts the likelihood that an object in a video frame is a human etc.

These tools have legitamite and important use cases it's just that a lot of the hype now is centered around the dumbest possible uses and a bunch of idiots trying to make money regardless of any associated ethical concerns or consequences.

[–] RememberTheApollo@lemmy.world 5 points 10 months ago (1 children)

It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.

That’s what the drive for AI is all about.

[–] RainfallSonata@lemmy.world 2 points 10 months ago

No doubt.

[–] webghost0101@sopuli.xyz 2 points 10 months ago

A perfectly valid stance to take.

[–] Grimy@lemmy.world 1 points 10 months ago

You don't have to use it. You can even disconnect from the internet completely.

Whats the benefit of stopping me from using it?

[–] TwilightVulpine@lemmy.world 13 points 10 months ago

It's not like all this data was randomly dumped at the AIs. For data sets to serve as good training materials they need contextual information so that the AI can discern patterns and replicate them when prompted.

We see this when you can literally prompt AIs with whose style you want it to emulate. Meaning that the data it was fed had such information.

Midjourney is facing extra backlash from artists after a spreadsheet was leaked containing a list of artist styles their AI was trained on. Meaning they can keep track of it and they trained the AI with those artists' works deliberately. They simply pretend this is impossible to figure out so that they might not be liable to seek permission and compensate the artists whose works were used.

load more comments (13 replies)

[–] CIA_chatbot@lemmy.world 5 points 10 months ago

Hey man, that’s damn hurtful

[–] dependencyinjection@discuss.tchncs.de 3 points 10 months ago

I’m not sure if someone else has brought this up, but I could see OpenAI and other early adopters pushing for tighter controls of training data as a means to be the only players in town. You can’t build your own competing AI because you won’t have the same amount of data as us and we’ll corner the market.

[–] Grimy@lemmy.world 1 points 10 months ago (2 children)

If the data has to be paid for, openAI will gladly do it with a smile on their face. It guarantees them a monopoly and ownership of the economy.

Paying more but having no competition except google is a good deal for them.

load more comments (2 replies)

load more comments (6 replies)

[–] TwilightVulpine@lemmy.world 45 points 10 months ago (4 children)

OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.

Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.

load more comments (4 replies)

[–] winterayars@sh.itjust.works 27 points 10 months ago (1 children)

That depends on what your definition of "publicly available" is. If you're scraping New York Times articles and pulling art off Tumblr then yeah, it's exactly stealing in the same way scihub is. Only difference is, scihub isn't boiling the oceans in an attempt to make rich people even richer.

[–] unionagainstdhmo@aussie.zone 3 points 10 months ago

Also Sci-hub don't make any money off the works

[–] kibiz0r@lemmy.world 21 points 10 months ago (3 children)

We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.

Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?

[–] BURN@lemmy.world 12 points 10 months ago

That’s exactly what they’re saying. The AI proponents believe that copyright shouldn’t be respected and they should be able to ignore any licensing because “it’s hard to find data otherwise”

load more comments (2 replies)

[–] Asafum@feddit.nl 13 points 10 months ago

Scientific research papers are generally public too, in that you can always reach out to the researcher and they'll provide the papers for free, it's just the "corporate" journals that need their profit off of other peoples work...

[–] grue@lemmy.world 12 points 10 months ago* (last edited 10 months ago) (1 children)

They want to kill the open-source scene

Yeah, by using the argument you just gave as an excuse to "launder" copyleft works in the training data into permissively-licensed output.

Including even a single copyleft work in the training data ought to force every output of the system to be copyleft. Or if it doesn't, then the alternative is that the output shouldn't be legal to use at all.

load more comments (1 replies)

[–] deweydecibel@lemmy.world 9 points 10 months ago* (last edited 10 months ago) (2 children)

The point is the entire concept of AI training off people's work to make profit for others is wrong without the permission of and compensation for the creator regardless if it's corporate or open source.

[–] Angry_Maple@sh.itjust.works 7 points 10 months ago (1 children)

I think I've decided to not publish anything that I want to keep ownership of, just in case. There's an entire planet's worth of countries, which will all have their own sets of laws. It takes waay too long to polish something, only to just give it away for free haha. Someone else is free to do that work if it is that easy. No skin off my back.

I think it's similar to many other hand-made crafts/items. Most people will buy their clothes from stores, but there are definitely still people who make beautiful clothing from hand better than machines could.

Don't even get me started on stuff like knitting. It already costs the creator a crap ton of money just for the materials. It takes a crap ton of time to make those, too. Despite the costs, many people just expect those knitted pieces for practically free. The people who expect that pricing are also free to go with machine-produced crafts/items instead.

It comes down to what people want, and what they're willing to pay, imo. Some people will find value in something physically being put together by another human, and other people will find value in having more for less. Neither is "wrong" necessarily, so long as no one is literally ripped off. (With over 8 billion people, it's bound to happen at least once. I feel bad for whoever that is.)

That being said, we'll never be able to honestly say that the specific skills and techniques that are currenty required are the exact same. It would be like calling a photographer amazing at realism painting because their photo looks like real life. Photographers and painters both have their place, but they are not the exact same.

I think that's also part of what's frustrating so many artists. Coding AI is not the same as using the colour wheel, choosing materials, working fine motor control, etc. It's not learning about shadows, contrast, focal points, etc. I can definitely understand people not wanting those aspects to be brushed off, especially since it usually takes most of a lifetime to achieve. A music generator and a violin may both make great music, but they are not the same, and they require different technical skills.

I'll never buy AI art if I have any say in the matter. I'll support handmade stuff first, every time.

load more comments (1 replies)

[–] Meowoem@sh.itjust.works 3 points 10 months ago (1 children)

I love that the people who push this kind of rhetoric often consider themselves left wing, it's just so silly.

'every word you ever utter must be considered private property and no other human may benefit from it without payments!'

I mean yes I know you're going to say socialism is about workers getting fair pay but come on, this is just pure rent seeking. We're a global community of people, if this comment helps train an ai that can help other people better live their lives, better access medicine and education or other services then I think that's a wonderful thing.

And yes of course it should be open source and free to all people, that's why these pushes to make sure only corporations can afford ai are so infuriating

[–] General_Effort@lemmy.world 3 points 10 months ago

So true.

This talking point, too, is so infuriatingly silly:

I mean yes I know you’re going to say socialism is about workers getting fair pay

Workers, by definition, don't own what they produce. Copyrights are intellectual property; business capital. Somehow, capitalists are workers in the minds of these people. This is your mind on trickle-down economics.

[–] SchizoDenji@lemm.ee 7 points 10 months ago* (last edited 10 months ago) (1 children)

All of the AI fear mongering is fuelled by mega corps who fear that AI in some sort will eat into their profits and they can't make money off of it.

Image generation also had similar outcry because open source models smoked all the commercial ones.

[–] Meowoem@sh.itjust.works 3 points 10 months ago

Yeah, just wait until they see the ai design tools that allow anyone to casually describe the spare part or upgrade they want and it'll be designed and printed at home or local fab shop.

Lot of once fairly safe monopolies are going to start looking very shaky, and then things like natural language cookery toolarms disrupting even more...

We've only barely started to see what the tech we have now is able to do, yes a million shitty chat bots / img gen apps are cashing in on the hype but when we start seeing some killer apps emerge it's when people won't be able to ignore it any longer

[–] BURN@lemmy.world 4 points 10 months ago

Too bad

If you can’t afford to pay the authors of the data required for your project to work, then that sucks for you, but doesn’t give you the right to take anything you want and violate copyright.

Making a data agnostic model and releasing the source is fine, but a released, trained model owes royalties to its training data.

[–] General_Effort@lemmy.world 3 points 10 months ago

True, Big Tech loves monopoly power. It's hard to see how there can be an AI monopoly without expanding intellectual property rights.

It would mean a nice windfall profit for intellectual property owners. I doubt they worry about open source or competition but only think as far as lobbying to be given free money. It's weird how many people here, who are probably not all rich, support giving extra money to owners, merely for owning things. That's how it goes when you grow up on Ayn Rand, I guess.

[–] Coasting0942@reddthat.com 2 points 10 months ago

This is the hardest thing to explain to people. Just convert it into a person with unlimited memory.

Open AI is sending said person to view every piece of human work, learns and makes connections, then make art or reports based on what you tell/ask this person.

Sci-Hub is doing the same thing but you can ask it for a specific book and they will write it down word for word for you, an exact copy.

Both morally should be free to do so. But we have laws that say the sci-hub human is illegally selling the work of others. Whereas the open ai human has to be given so many specific instructions to reproduce a human work that it’s practically like handing it a book and it handing the book back to you.

load more comments (1 replies)

[–] richieadler@lemmy.myserv.one 23 points 10 months ago* (last edited 10 months ago) (1 children)

Cue the Max Headroom episode where the blanks (disconnected people) are chased by the censors because the blanks steal cable so their children can watch the educational shows and learn to read, and they are forced to use clandestine printing presses to teach them.

[–] grue@lemmy.world 3 points 10 months ago

Reminds me of this: https://www.gnu.org/philosophy/right-to-read.html

load more comments (10 replies)