Furry Technologists

1310 readers

1 users here now

Science, Technology, and pawbs

founded 1 year ago

MODERATORS

Soatok@pawb.social

stefenauris@pawb.social

The Tech Industry Doesn’t Understand Consent - Dhole Moments (soatok.blog)

submitted 8 months ago by Soatok@pawb.social to c/tech@pawb.social

13 comments fedilink hide all child comments

top 13 comments

sorted by: hot top controversial new old

[–] Ptsf@lemmy.world 18 points 8 months ago

They understand consent just fine, it's just that it's well within their best interests to pretend they do not until the user base is motivated to manifest financial or legal reprocussions. Years of Facebook abuse has shown that anything levied will be "cost of doing business" at best though.

[–] Linkyu@lemmy.blahaj.zone 15 points 8 months ago

The worst part is, the data has been scraped already, regardless of any opt-out, and there is no explicit confirmation that it hasn't been shared with midjourney already. The wording on the staff post is... Very Vague about that.

Because if that's the case, no amount of opting-out will change anything. Tumblr says they'll notify Midjourney if some data is now opted-out, but come on, I have absolutely no reason to believe that Midjourney will do anything about it. They don't care, they already have the data.

[–] l_b_i@yiffit.net 6 points 8 months ago (1 children)

So I guess there are two paths of training data. Some company selling it explicitly, and the companies just scraping accessible data. Not that either is "good", but at least with public data, you only have the AI company profiting.

[–] Soatok@pawb.social 6 points 8 months ago (1 children)

Yep. That's why the two things I say Automattic MUST do to make things right are about proper consent controls for Automattic's use of data and sale to AI vendors, but the third thing is a proposed proactive defense against scrapers.

[–] mindbleach@sh.itjust.works 5 points 8 months ago (1 children)

Making the web un-scrapable to prevent AI is a terrible idea that won't even work. You're talking about DRM against the user's browser... to read publicly-available text... as if the LLM genie can get shoved back in its bottle.

[–] Soatok@pawb.social 2 points 8 months ago* (last edited 8 months ago) (1 children)

No? That's not what NightShade is. NightShade isn't DRM.

https://nightshade.cs.uchicago.edu/whatis.html

[–] mindbleach@sh.itjust.works 3 points 8 months ago (1 children)

Oh, you meant using a generative network to modify artwork so that generative networks can't learn to modify artwork. A process that's totally not intrinsic to adversarial training.

[–] Soatok@pawb.social 2 points 8 months ago (1 children)

If you make the cost of bypassing Nightshade higher than the cost of convincing people to opt in to their data being used in LLM training, then the outcome is obvious. "If you show me the incentives, I'll show you the outcome."

[–] mindbleach@sh.itjust.works 3 points 8 months ago (1 children)

The cost will become negligible for any nigh-invisible data fuckery. Like how "single pixel attacks" aren't really a thing, anymore. And how alphanumeric Captcha became so hard that humans struggle to discern letters.

(The cost of Nightshade versus LLMs is nothing, because LLMs are for text.)

There will be nothing you can fuck with in an image that changes what all networks see, without changing what all humans see. Only a style-transfer network that removes the artist's style will ultimately keep training from discerning that style.

This is downright laughable when Nightshade can be applied to any existing image, locally... meaning people training on scraped data could surely identify the presence and impact of Nightshade. We're talking about networks which already exist that can look at a blob of pixels and pick out which parts look like a Picasso, or an avocado chair, or Hatsune Miku. Stable Diffusion in particular is a denoiser. Identifying damage and nonsense is all it does. If that environment includes deliberate countermeasures, they will be worked into the model through existing training, just like watermarks, JPEG artifacts, and the random noise used to make this shit work in the first place.

[–] Soatok@pawb.social 0 points 8 months ago (1 children)

I choose not to make perfect the enemy of good.

[–] mindbleach@sh.itjust.works -1 points 8 months ago

Word salad, in this context.

The cost you expect to matter will not exist.

[–] ryven@lemmy.dbzer0.com 3 points 8 months ago (1 children)

This page admonishes me for not using an adblocker, but uBlock Origin says it blocked 19 elements? UBO's stealth game is too good, lmao.

[–] Soatok@pawb.social 2 points 8 months ago

https://github.com/stefanbohacek/detect-missing-adblocker/issues/19