this post was submitted on 28 Jul 2023
335 points (94.7% liked)

Technology

59201 readers
2880 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

top 50 comments
sorted by: hot top controversial new old
[–] Hamartiogonic@sopuli.xyz 96 points 1 year ago (2 children)

Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel

[–] Eheran@lemmy.world 28 points 1 year ago (1 children)

The background radiation did go up, but saying "there was hardly any radiation anywhere" is wrong. Today's steel (and background radiation) is pretty much back to pre-nuke levels. Low-background steel Background radiation

[–] evatronic@lemm.ee 12 points 1 year ago

It is also worth nothing that we can make low or no radiation-contaminated steel, it's just really expensive and hard and happens in very low quantities.

[–] lily33@lemmy.world 3 points 1 year ago (1 children)

Not really. If it's truly impossible to tell the text apart, than it doesn't really pose a problem for training AI. Otherwise, next-gen AI will be able to tell apart text generated by current gen AI, and it will get filtered out. So only the most recent data will have unfiltered shitty AI-generated stuff, but they don't train AI on super-recent text anyway.

[–] Womble@lemmy.world 17 points 1 year ago (10 children)

This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.

load more comments (10 replies)
[–] Peanutbjelly@sopuli.xyz 49 points 1 year ago (9 children)

The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.

Existing datasets still exist. The bigger focus is in crossing modalities and refining content.

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don't even know half of how the machines work or what they are doing.

[–] _jonatan_@lemmy.world 52 points 1 year ago (4 children)

Probably because LLMs threaten to (and has already started to) shittify a truly incredible number of things like journalism, customer service, books, scriptwriting etc all in the name of increased profits for a tiny few.

[–] Peanutbjelly@sopuli.xyz 42 points 1 year ago (10 children)

again, the issue isn't the technology, but the system that forces every technological development into functioning "in the name of increased profits for a tiny few."

that has been an issue for the fifty years prior to LLMs, and will continue to be the main issue after.

removing LLMs or other AI will not fix the issue. why is it constantly framed as if it would?

we should be demanding the system adjust for the productivity increases we've already seen, as well to what we expect in the near future. the system should make every advancement a boon for the general populace, not the obscenely wealthy few.

even the fears of propaganda. the wealthy can already afford to manipulate public discourse beyond the general public's ability to keep up. the bigger issue is in plain sight, but is still being largely ignored for the slant that "AI is the problem."

[–] Gutless2615@ttrpg.network 17 points 1 year ago (17 children)

It’s a capitalism problem not an AI or copyright problem.

load more comments (17 replies)
[–] p03locke@lemmy.dbzer0.com 16 points 1 year ago

Yep, the problem was never LLMs, but billionaires and the rich. The problems have always been the rich for thousands of years, and yet they are immensely successful at deflecting their attacks to other groups for those thousands of years. They will claim it's Chinese immigrants, or blacks, or Mexicans, or gays, or trans people. Now LLMs and AI are the new boogieman.

We should be talking about UBI, not LLMs.

[–] jackoneill@lemmy.world 6 points 1 year ago

This isn’t a technological issue, it’s a human one

I totally agree with everything you said, and I know that it will never ever happen. Power is used to get more power. Those in power will never give it up, only seek more. They intentionally frame the narrative to make the more ignorant among us believe that the tech is the issue rather than the people that own the tech.

The only way out of this loop is for the working class to rise up and murder these cunts en masse

Viva la revolucion!

[–] glockenspiel@lemmy.world 3 points 1 year ago* (last edited 1 year ago)

It is a completely understandable stance in the face of the economic model, though. Your argument could be fitted to explain why firearms shouldn’t be regulated at all. It isn’t the technology, so we should allow the sale of actual machine guns (outside of weird loopholes) and grenade launchers.

The reality is that the technology is targeted by the people affected by it because we are hopeless in changing the broader system which exists to serve a handful of parasitic non-working vampires at the top of our societies.

Edit: not to suggest that I’m against AI and LLM. I want my fully automated luxury communism and I want it now. However, I get why people are turning against this stuff. They’ve been fucked six ways from Sunday and they know how this is going to end for them.

Plus, a huge amount of AI doomerism is being pushed by the entrenched monied AI players, like OpenAI and Meta, in order to used a captured government to regulate potential competition out of existence.

load more comments (6 replies)
[–] tdawg@lemmy.world 3 points 1 year ago

Technology is but a tool. It cannot tell you how to use it. If it's in the hands of a writer it's a helpful sounding board. If it's in the hands of a Netflix producer it's an anti-labor tool. We need to protect people's livelyhoods

load more comments (2 replies)
[–] HeavenAndHell@lemmy.world 3 points 1 year ago

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don’t even know half of how the machines work or what they are doing.

Yah I think it's fairly obvious that people are both fascinated and scared by the tech and also acknowledge that under a different economic structure, it would be extremely beneficial for everyone and not just for the very few. I think it's more annoying that people like you assume that everyone is some sort of diet Luddite when they're just trying to see how the tool has the potential to disrupt many, many jobs and probably not in a good way. And don't give me this tired comparison about the industrial revolution because it's a complete false equivalence.

load more comments (7 replies)
[–] BackupRainDancer@lemmy.world 18 points 1 year ago* (last edited 1 year ago) (2 children)

Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.

We're less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can't tell. Good luck with the FB aunt's.

GANs final goal is to develop content that is indistinguishable... Are we surprised?

Edit since the person below me made a great point. GANs may be limited but there's nothing that says you can't setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.

[–] throwsbooks@lemmy.ca 18 points 1 year ago (2 children)

For laymen who might not know how GANs work:

Two AI are developed at the same time. One that generates and one that discriminates. The generator creates a dataset, it gets mixed in with some real data, then that all of that gets fed into the discriminator whose job is to say "fake or not".

Both AI get better at what they do over time. This arms race creates more convincing generated data over time. You know your generator has reached peak performance when its twin discriminator has a 50/50 success rate. It's just guessing at that point.

There literally cannot be a better AI than the twin discriminator at detecting that generator's work. So anyone trying to make tools to detect chatGPT's writing is going to have a very hard time of it.

[–] BackupRainDancer@lemmy.world 3 points 1 year ago

Fantastically put!

load more comments (1 replies)
[–] EuphoricPenguin22@normalcity.life 3 points 1 year ago (5 children)

Unless I'm mistaken, aren't GANs mostly old news? Most of the current SOTA image generation models and LLMs are either diffusion-based, transformers, or both. GANs can still generate some pretty darn impressive images, even from a few years ago, but they proved hard to steer and were often trained to generate a single kind of image.

load more comments (5 replies)
[–] art@lemmy.world 18 points 1 year ago (1 children)

We built a machine to mimic human writing. There's going to a point where there is no difference. We might already be there.

[–] MyUnclesSecret@lemmy.world 3 points 1 year ago

The machine used to mimic human text uses human text. If it can't find the difference in it's text and human text, it will begin using AI text to mimic human text. This will eventually lead to errors, repetitions, and/or less human like text.

[–] ChrislyBear@lemmy.world 12 points 1 year ago (3 children)

So every accusation of cheating/plagiarism etc. and the resulting bad grades need to be revised because the AI checker incorrectly labelled submissions as "created by AI"? OK.

[–] Peanutbjelly@sopuli.xyz 7 points 1 year ago* (last edited 1 year ago)

i laughed pretty hard when south park did their chatgpt episode. they captured the school response accurately with the shaman doing whatever he wanted, in order to find content "created by AI."

load more comments (2 replies)
[–] average650@lemmy.world 9 points 1 year ago

I mean, the entire goal of the technology was to create human-like text.

[–] thebestaquaman@lemmy.world 8 points 1 year ago

This just illustrates the major limitation of ML: Access to reliable training data. A machine that has no concept of internal reasoning can never be truly trusted to solve novel problems, and novel problems, from minor issues to very complex ones, are solved in a bunch of professions every day. That's what drives our world forward. If we rely too heavily on AI to solve problems for us, the issue of obtaining reliable training data to train future AI's will only expand. That's why I currently don't think AI's will replace large swaths of the work force, but to a larger degree be used as a tool by the humans in the workforce.

[–] Techmaster@lemmy.world 7 points 1 year ago

Relax, everybody. I have figured out the solution. We pass a law that all AI generated text has to be in Pig Latin or Ubbi Dubbi.

[–] kvothelu@lemmy.world 6 points 1 year ago (2 children)

i wonder why Google is still not considering buying reddit and other forums where personal discussion takes place and most user base sort quality content free of charge. it has been established already that Google queries are way more useful when coupled with reddit

[–] Lenins2ndCat@lemmy.world 12 points 1 year ago (1 children)

Making google better is not google's goal. Growth is their goal.

I'm honestly under the impression Google Search is one of their less valuable products, even if it's the one everyone associates the company's name with.

[–] howrar@lemmy.ca 3 points 1 year ago (1 children)

Why buy it when you can get the same data for free?

[–] MercuryUprising@lemmy.world 3 points 1 year ago

Why buy data for accuracy when you don't care and support your company with seo spam?

[–] professor_entropy@lemmy.world 5 points 1 year ago* (last edited 1 year ago)

FWIW It's not clear cut if AI generated data feeding back into further training reduces accuracy, or is generally harmful.

Multiple papers have shown that generated images by high quality diffusion models with a proportion of real images in mix (30-50%) improve the adversarial robustness of the models. Similiar things might apply to language modeling.

[–] Matriks404@lemmy.world 4 points 1 year ago

I wonder if AI generated texts (or speech) will impact our language. Kinda interesting thing to think about.

[–] Toneswirly@lemmy.world 4 points 1 year ago (1 children)

OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.

[–] diffuselight@lemmy.world 6 points 1 year ago (8 children)

Its not possible to tell AI generated text from human writing at any level of real world accuracy. Just accept that.

load more comments (8 replies)
[–] RandomlyAssigned@lemmy.world 4 points 1 year ago

On the one hand, our AI is designed to mimic human text, on the other hand, we can detect AI generated text that was designed to mimic human text. These two goals don't align at a fundamental level

load more comments
view more: next ›