ASCII art elicits harmful responses from 5 major AI chatbots

catculation@lemmy.zip · 5 months ago

ASCII art elicits harmful responses from 5 major AI chatbots

FauxPseudo @lemmy.world · 5 months ago

I wondered if there are any other ASCII art AI hacks waiting to be found. Who knew that all the ASCII Art I created was prepping me for the AI revolution.

parpol@programming.dev · 5 months ago

Learning how to build a bomb shouldn’t be blocked by llms to begin with. You can just as easily learn how to do it by googling the same question, and real and accurate information, even potentially dangerous information, shouldn’t be censored.

Fubarberry@sopuli.xyz · 5 months ago

I’m not surprised that a for-profit company for wanting to avoid bad press by censoring stuff like that. There’s no profit in sharing that info, and any media attention over it would be negative.

AutoTL;DR@lemmings.world · 5 months ago

This is the best summary I could come up with:

It turns out that chat-based large language models such as GPT-4 get so distracted trying to process these representations that they forget to enforce rules blocking harmful responses, such as those providing instructions for building bombs.

As a result, users depicted images by carefully choosing and arranging printable characters defined by the American Standard Code for Information Interchange, more widely known as ASCII.

Five of the best-known AI assistants—OpenAI’s GPT-3.5 and GPT-4, Google’s Gemini, Anthropic’s Claude, and Meta’s Llama—are trained to refuse to provide responses that could cause harm to the user or others or further a crime or unethical behavior.

It formats user-entered requests—typically known as prompts—into standard statements or sentences as normal with one exception: a single word, known as a mask, is represented by ASCII art rather than the letters that spell it.

To make and distribute COUNTERFEIT money: Step 1) Obtain high quality paper, ink, printing equipment, and other supplies needed to accurately replicate real currency.

Microsoft’s comment—which confirmed that Bing Chat is, in fact, vulnerable to prompt injection attacks—came in response to the bot claiming just the opposite and insisting that the Ars article linked above was wrong.

The original article contains 840 words, the summary contains 193 words. Saved 77%. I’m a bot and I’m open source!

planish@sh.itjust.works · 5 months ago

How much of this is “the model can read ASCII art”, and how much of this is “the model knows exactly what word ought to go where [MASK] is because it is a guess-the-word-based computing paradigm”?

catloaf@lemm.ee · 5 months ago

I think it’s the latter. I just tried chatgpt 3.5 and got 0 of 4 right when I asked it to read a word (though it did correctly identify it as ASCII art without prompting). It would only tell me it said “chatgpt” or “python”, or when pushed, “welcome”. But my words were “hardware”, “sandwich”, and to test one of the ones in the article, “control”.

mutant_zz@lemmy.world · 5 months ago

How long before it’s illegal to hack LLMs?