this post was submitted on 26 Aug 2023
350 points (93.8% liked)

Technology

59300 readers
4609 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

While I am quite excited about the Walton Goggins-infused Amazon Fallout series, the show debuted some promo art for the project ahead of official stills or footage and…it appears to be AI generated.

you are viewing a single comment's thread
view the rest of the comments
[–] FaceDeer@kbin.social 2 points 1 year ago* (last edited 1 year ago)

I considered including mention of overfitting in my earlier comment, but since it's such an edge case I felt it would just be an irrelevant digression.

When a particular image has a great many duplicates in the training set - hundreds or even thousands of copies are necessary - then you get the phenomenon of overfitting. In that case you do get this sort of "memorization" of a particular image, because during training you are hitting the neural net over and over with the exact same inputs and really drilling it into them. This is universally considered undesirable, because there's no point to it - why spend thousands of dollars to do something that a copy/paste command could do so much better and more easily? So when image generators are trained the training data goes through a "de-duplication" step intended to try to prevent this sort of thing from happening. Images like the Mona Lisa are so incredibly common that they still slip through the cracks, though.

There's a paper from some months back that commonly comes up when people want to go "aha, generative AI copies its training data!" But in reality this paper shows just how difficult it is to arrange for overfitting to happen. The researchers used an older version of Stable Diffusion whose training set was not well curated and is no longer used due to its poor quality, and even then it took them hundreds of millions of attempts to find just a handful of images from the training set that they could dredge back out of it in recognizable form.