Someone got Gab's AI chatbot to show its instructions

mozz@mbin.grits.dev · 3 months ago

Someone got Gab's AI chatbot to show its instructions

teawrecks@sopuli.xyz · 3 months ago

Any input to the 2nd LLM is a prompt, so if it sees the user input, then it affects the probabilities of the output.

There’s no such thing as “training an AI to follow instructions”. The output is just a probibalistic function of the input. This is why a jailbreak is always possible, the probability of getting it to output something that was given as input is never 0.

sweng@programming.dev · edit-2 3 months ago

You are wrong: https://stackoverflow.com/questions/76451205/difference-between-instruction-tuning-vs-non-instruction-tuning-large-language-m

teawrecks@sopuli.xyz · 3 months ago

Ah, TIL about instruction fine-tuning. Thanks, interesting thread.

Still, as I understand it, if the model has seen an input, then it always has a non-zero chance of reproducing it in the output.

sweng@programming.dev · 3 months ago

No. Consider a model that has been trained on a bunch of inputs, and each corresponding output has been “yes” or “no”. Why would it suddenly reproduce something completely different, that coincidentally happens to be the input?