this post was submitted on 17 Feb 2024

1026 points (98.8% liked)

Technology

66465 readers

5482 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

1026

Reddit has reportedly signed over its content to train AI models (mashable.com)

submitted 1 year ago by return2ozma@lemmy.world to c/technology@lemmy.world

153 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] IchNichtenLichten@lemmy.world 126 points 1 year ago (6 children)

A LLM that behaves like a typical Redditor?

What possible use is that?

[–] SonnyVabitch@lemmy.world 69 points 1 year ago (3 children)

Air Canada offering a refund of tree fiddy.

[–] IchNichtenLichten@lemmy.world 21 points 1 year ago (1 children)

You'll get your refund eventually but first it will try and gaslight you that Air Canada is a woke mind virus before calling you an asshole and then stalking you.

load more comments (1 replies)

load more comments (2 replies)

[–] honey_im_meat_grinding@lemmy.blahaj.zone 29 points 1 year ago (1 children)

What possible use is that?

I've noticed "has this sub gotten more right wing recently?" posts reaching the top post of the day in the last 6 months or so. r/norge and r/unitedkingdom being examples. You can automate bots that change a subreddit's consensus on certain topics by bot-spamming threads pertaining to those topics, especially in the first hour of a thread going up. I don't know if that's happening, or if it has more to do with the Reddit protest that saw mods abdicate their positions last June and new mods being responsible for the change... but it could also be a bit of both.

load more comments (1 replies)

load more comments (4 replies)

[–] garibaldi_biscuit@lemmy.world 110 points 1 year ago (1 children)

This is what the 3rd party access to API was really all about.

When API access was allowed , all reddit content was effectively free: They needed to ban 3rd party apps so they could sell the accumulated content. I expect using content to train AI also factors into it.

[–] bier@feddit.nl 12 points 1 year ago (2 children)

Is it? Because when you build a bot and just scrape Reddit I don't think you can just use the content to train AI, just like the New York Times. The API change was definitely to sell more ads and get a higher IPO, but I don't think it was because of AI.

load more comments (2 replies)

[–] tigerjerusalem@lemmy.world 104 points 1 year ago* (last edited 1 year ago) (4 children)

Reddit is a trove of user built content under the guise of community. What Spez did was to say "thanks for all the free work, suckers!", put a price sticker on it, and laughed all the way to the bank.

~~And this is why I'm not active on any Internet community anymore.~~ Nevermind, I guess I just can't help myself...

[–] nodsocket@lemmy.world 33 points 1 year ago (7 children)

And this is why I’m not active on any Internet community anymore,

you typed.

load more comments (7 replies)

[–] Adulated_Aspersion@lemmy.world 23 points 1 year ago (6 children)

And that is another unintended example of why all of my post history was purged before migration.

load more comments (6 replies)

load more comments (2 replies)

[–] Verserk@lemmy.dbzer0.com 82 points 1 year ago (4 children)

Considering some of the very wrong and upvoted domain specific knowledge I've seen on Reddit over the years I'm not sure the training data is going to be useful for much beyond what every other model can do.

[–] JustZ@lemmy.world 46 points 1 year ago (3 children)

The legal advice in /r/legaladvice was some of the worst garbage I've ever seen. I have zero doubt numerous had bad outcomes, at best wasting money and time, at worst spending years in jail because of things that sub told them to say and do. Zero doubt.

[–] evatronic@lemm.ee 20 points 1 year ago

That sub was mostly cops just repeating their own bad interpretation of the law. Terrible.

load more comments (2 replies)

[–] aStonedSanta@lemm.ee 16 points 1 year ago (1 children)

lol subreddits with troll names like trees vs marijuana enthusiasts. Good fun. John cena has one also but can’t recall which subreddit is actually about John cena though.

[–] can@sh.itjust.works 14 points 1 year ago

Potato salad

load more comments (2 replies)

[–] Voyajer@lemmy.world 59 points 1 year ago (3 children)

This is why I don't blame anyone for editing/deleting their post history on reddit.

load more comments (3 replies)

[–] Strayce@lemmy.sdf.org 58 points 1 year ago (1 children)

Considering how much of Reddit is already bots, I'm sure this will end fantastically.

load more comments (1 replies)

[–] gedaliyah@lemmy.world 57 points 1 year ago (3 children)

The AI:

"IANAL so could you ELI5, so AITA?

THIS."

[–] bigkahuna1986@lemmy.ml 25 points 1 year ago (2 children)

Ann frankly, I did Nazi that coming.

[–] storcholus@feddit.de 22 points 1 year ago

Holy shit do I hate that comment

load more comments (1 replies)

load more comments (2 replies)

[–] KairuByte@lemmy.dbzer0.com 55 points 1 year ago (2 children)

Their content?

load more comments (2 replies)

[–] ozoned@lemmy.world 43 points 1 year ago (6 children)

"Reddit has given access to YOUR conversations and posts to AI companies.". FTFY

These were created by people, for peoole, and I will ALWAYS disagree that this data is Reddit's or any other platforms.

Don't forget your direct messages aren't end to end encrypted on Reddit, so now AI will be trained on your craziest "private" conversations

load more comments (6 replies)

[–] NutWrench@lemmy.world 38 points 1 year ago (7 children)

Reddit is all bots, porn, ads and political shit posts. Good luck getting any useful training content out of that.

[–] ladicius@lemmy.world 19 points 1 year ago

Maybe that's the point? Training the AI to produce the blabbering bullshit that's preferred in social media?

load more comments (6 replies)

[–] Bobmighty@lemmy.world 32 points 1 year ago (1 children)

With reddits severe bot problem, it'll be like training on unfiltered sewage. Garbage in, garbage out.

load more comments (1 replies)

[–] SVcross@lemmy.world 27 points 1 year ago (1 children)

Damn it. I haven't deleted my account due to how many people I've supported and helped, I stopped using it while ago. It seems I'll have to.

[–] HowManyNimons@lemmy.world 17 points 1 year ago (1 children)

I wouldn't bother. They'll just mark all your stuff DELETED=1 and feed it to their AI anyway.

load more comments (1 replies)

[–] asymmetric@lemmy.ca 26 points 1 year ago

One of the original Reddit memes was quite prescient:

https://i.imgur.com/Fza1Cut.jpg

[–] Yokozuna@lemmy.world 26 points 1 year ago (4 children)

Good thing I scrubbed all of my posts and comments that I could. Fuck that site, straight up and down.

[–] ItsAFake@lemmus.org 21 points 1 year ago (3 children)

You really think they don't have your original comments stored?

[–] EdibleFriend@lemmy.world 28 points 1 year ago (1 children)

It's literally been proven that they do. A guy here on Lemmy was a very common poster on some tech support subreddit. He used one of those account scrubbers and deleted his account. He went back to look a few weeks later and all his comments were back.

[–] Thorny_Insight@lemm.ee 11 points 1 year ago (2 children)

I didn't delete my account but I used a script to edit all my messages to say that I have left because of the attack on 3rd party apps and when I check now they all still say that.

load more comments (2 replies)

load more comments (3 replies)

[–] DudeImMacGyver@sh.itjust.works 25 points 1 year ago (7 children)

Where's my cut?

[–] Fake4000@lemmy.world 37 points 1 year ago (1 children)

You signed it all away the moment you scrolled down that EULA 😂

load more comments (1 replies)

load more comments (6 replies)

[–] erAck@discuss.tchncs.de 24 points 1 year ago (11 children)

It will get trained on some comment posts.

Let reddit die. Join Lemmy or /kbin. https://join-lemmy.org/ https://kbin.pub/

load more comments (11 replies)

[–] hansl@lemmy.world 24 points 1 year ago

In before poisoning your comments on Reddit turns into the new protest.

[–] 31337@sh.itjust.works 23 points 1 year ago

I wish there was a license for content like the GPL, that states if you use this content to train generative AI, the model must be open source. Not sure that would legally be enforceable though (due to fair-use).

[–] HowManyNimons@lemmy.world 19 points 1 year ago

Good. Maybe when it cogitates the things I've written it might start offering up some better ideas.

[–] aidan@lemmy.world 18 points 1 year ago

*laughs villainously* This is all going to plan, now there will be some chatbot spewing my insane beliefs

[–] DozensOfDonner@mander.xyz 16 points 1 year ago (1 children)

Why does it sound like reddit trained AI will only get dumber.

[–] jol@discuss.tchncs.de 15 points 1 year ago

That would explain why GPT is often so confidently incorrect.

[–] BetaDoggo_@lemmy.world 15 points 1 year ago (3 children)

Who's dumb enough to pay for that? Everyone else is just scraping it for free.

load more comments (3 replies)

[–] BigTrout75@lemmy.world 14 points 1 year ago (1 children)

So AI models are not farming the federation?

[–] nightwatch_admin@feddit.nl 13 points 1 year ago* (last edited 1 year ago) (1 children)

They probably are, but not the personal/private info like chat/DM, upvotes or downvotes, geolocation, etc which I highly suspect Reddit did sell.

[–] KairuByte@lemmy.dbzer0.com 26 points 1 year ago* (last edited 1 year ago) (2 children)

Just FYI, your voting is fully public on Lemmy. DMs are “private” but could be intercepted at the server level of any instances involved (yours and the receiver/sender) and of course your geolocation info is visible to the server.

Not saying that is happening, and not trying to spread FUD, but be aware that your info isn’t necessarily private just because a corpo isn’t directly involved.

load more comments (2 replies)

[–] giddy@aussie.zone 14 points 1 year ago (3 children)

Glad I nuked all my posts and comments and deleted my account last year

load more comments (3 replies)

[–] doingthestuff@lemmy.world 12 points 1 year ago (5 children)

Good thing I had multiple bots overwrite my content before I deleted it all. Not that someone couldn't recover it, I'm not naive. But the AI bots should miss me.

load more comments (5 replies)

[–] lvxferre@mander.xyz 12 points 1 year ago* (last edited 1 year ago)

I am not sure on what I'm going to say, but I think that LLMs are a technological dead end. They might get some use now, but eventually the industry will shift towards better models for machine text generation. And, if those models rely on a tiny corpus of hand-reviewed data, instead of shoving down as much text as possible into the model (the first "L" in "LLM" is "large"), then Reddit posts/comments will become outright useless.

In other words: Reddit is degrading further the trust of its userbase, and it might not even get much in return.

load more comments