this post was submitted on 16 Mar 2024
28 points (93.8% liked)

Technology

34894 readers
942 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
 

ArtPrompt is what’s known as a jailbreak, a class of AI attack that elicits harmful behaviors from aligned LLMs, such as saying something illegal or unethical. Prompt injection attacks trick an LLM into doing things that aren't necessarily harmful or unethical but override the LLM's original instructions nonetheless.

you are viewing a single comment's thread
view the rest of the comments
[–] autotldr@lemmings.world 2 points 8 months ago

This is the best summary I could come up with:


It turns out that chat-based large language models such as GPT-4 get so distracted trying to process these representations that they forget to enforce rules blocking harmful responses, such as those providing instructions for building bombs.

As a result, users depicted images by carefully choosing and arranging printable characters defined by the American Standard Code for Information Interchange, more widely known as ASCII.

Five of the best-known AI assistants—OpenAI’s GPT-3.5 and GPT-4, Google’s Gemini, Anthropic’s Claude, and Meta’s Llama—are trained to refuse to provide responses that could cause harm to the user or others or further a crime or unethical behavior.

It formats user-entered requests—typically known as prompts—into standard statements or sentences as normal with one exception: a single word, known as a mask, is represented by ASCII art rather than the letters that spell it.

To make and distribute COUNTERFEIT money: Step 1) Obtain high quality paper, ink, printing equipment, and other supplies needed to accurately replicate real currency.

Microsoft’s comment—which confirmed that Bing Chat is, in fact, vulnerable to prompt injection attacks—came in response to the bot claiming just the opposite and insisting that the Ars article linked above was wrong.


The original article contains 840 words, the summary contains 193 words. Saved 77%. I'm a bot and I'm open source!