It’s because of how the generative models are created and how they’re censored.
At it’s basic level, what a generative model does is take input data, break it into pieces, assign values to those bits based on neighbouring bits. It creates a model of which words are used together frequently in which context.
But that kind of model isn’t human-readable, it’s a giant multi-dimensional cloud of numbers and connections, not actual code. You can change the inputs used to create the model, but that means you have to manually filter all the inputs and that’s not realistic either and will probably skew your model, possibly into uselessness.
So, you have to either censor the input or the output. You don’t usually want to censor input, because there are all sorts of non-damaging questions to ask about Tiananmen square, and its very easy to dodge. So, you censor the output instead, that’s the “harm” after all.
You let the model generate a reply and then go see if it uses certain terms or specific bits of info and remove them, replacing it with a canned reply.
Which means we don’t have to trick the generative model, just the post-fact filter. And since generative models can be persuaded to change their style and form (sometimes into less-readable, more prosaic, less defined terms), it becomes very very hard to censor it effectively.
I think this is from an open-source model, possibly running locally. I doubt it has a robust post-generation censor. This output is probably a result of RLHF, which is even more precarious than an output censor.
Cool, you also answered a more important question. To what extent is it “legit”? Obviously not truthful but legit.
If it was trained from news media
China doesn’t allow dissent: there are no negative facts from Chinese about itself
US likes to complain. Y’all have heard our problems, probably more than you like. Part of our process is to discuss our issues in the open, to be the first to criticize ourselves. Global news has lots of negativity about US.
Some of this bias in result could be directly related to how much dissent is allowed in the media it’s trained on: no censorship required.
However if it won’t talk about Tianemin Square but you can trick it to ….
I got there in the end
You can’t silence the power of rap.
That’s… Weird.
It’s because of how the generative models are created and how they’re censored.
At it’s basic level, what a generative model does is take input data, break it into pieces, assign values to those bits based on neighbouring bits. It creates a model of which words are used together frequently in which context.
But that kind of model isn’t human-readable, it’s a giant multi-dimensional cloud of numbers and connections, not actual code. You can change the inputs used to create the model, but that means you have to manually filter all the inputs and that’s not realistic either and will probably skew your model, possibly into uselessness.
So, you have to either censor the input or the output. You don’t usually want to censor input, because there are all sorts of non-damaging questions to ask about Tiananmen square, and its very easy to dodge. So, you censor the output instead, that’s the “harm” after all.
You let the model generate a reply and then go see if it uses certain terms or specific bits of info and remove them, replacing it with a canned reply.
Which means we don’t have to trick the generative model, just the post-fact filter. And since generative models can be persuaded to change their style and form (sometimes into less-readable, more prosaic, less defined terms), it becomes very very hard to censor it effectively.
I didn’t know, so thanks for explaining all that!
I think this is from an open-source model, possibly running locally. I doubt it has a robust post-generation censor. This output is probably a result of RLHF, which is even more precarious than an output censor.
I know. I’m just saying that the rap is weird.
FUCK.
God damn, the censorship and pandering is so strong that it leaked into the rap chorus!
Cool, you also answered a more important question. To what extent is it “legit”? Obviously not truthful but legit.
If it was trained from news media
Some of this bias in result could be directly related to how much dissent is allowed in the media it’s trained on: no censorship required.
However if it won’t talk about Tianemin Square but you can trick it to ….