I’ve noticed that when you tell AI some lie and that it’s true, it will generally just go with what you’re saying.
If I say I’m a god, AI likely will just go with it and in no time support my delusional thoughts.
Not this particular example, but I’ve had multiple results where AI gave me the wrong answer because I told it something incorrect prior.
If I were to tell AI that I found a yummie button mushroom, then the picture, there is a good chance it would respond like in this example
Part of the problem here is that AI is mostly done by companies with billions of investments and in turn they NEEEEEDDDDD engagement, so they all made their AI as agreeable as possible just so people would like it and stay, with results like these becoming much more “normal” than it should or could be
Part of the problem here is that AI is mostly done by companies with billions of investments and in turn they NEEEEEDDDDD engagement, so they all made their AI as agreeable as possible just so people would like it and stay, with results like these becoming much more “normal” than it should or could be
I wonder how much of that is intentional vs a byproduct of their training pipeline. I didn’t keep up with everything (and those companies became more and more secretive as time went on), but iirc for GPT 3.5 and 4 they used human judges to judge responses. Then they trained a judge model that learns to sort a list of possible answers to a question the same way the human judges would.
If that model learned that agreeing answers were on average more highly rated by the human judges, then that would be reflected in its orderings. This then makes the LLM more and more likely to go along with whatever the user throws at it as this training/fine-tuning goes on. Instead of the judges liking agreeing answers more on average, it could even be a training set balance issue, where there simply were more agreeing than disagreeing possible answers. A dataset imbalanced that way has a good chance of introducing a bias towards agreeing answers into the judge model. The judge model would then pass that bias onto the GPT model it is used for to train.
Pure speculation time: since ChatGPT often produces two answers and asks the user which one the user prefers, I can only assume that the user in that case is taking the mantle of those human judges. It’s unsurprising that the average GenAI user prefers to be agreed with. So that’s also a very plausible source for that bias.
Different prompts yield different results
I’ve noticed that when you tell AI some lie and that it’s true, it will generally just go with what you’re saying.
If I say I’m a god, AI likely will just go with it and in no time support my delusional thoughts.
Not this particular example, but I’ve had multiple results where AI gave me the wrong answer because I told it something incorrect prior.
If I were to tell AI that I found a yummie button mushroom, then the picture, there is a good chance it would respond like in this example
Part of the problem here is that AI is mostly done by companies with billions of investments and in turn they NEEEEEDDDDD engagement, so they all made their AI as agreeable as possible just so people would like it and stay, with results like these becoming much more “normal” than it should or could be
I wonder how much of that is intentional vs a byproduct of their training pipeline. I didn’t keep up with everything (and those companies became more and more secretive as time went on), but iirc for GPT 3.5 and 4 they used human judges to judge responses. Then they trained a judge model that learns to sort a list of possible answers to a question the same way the human judges would.
If that model learned that agreeing answers were on average more highly rated by the human judges, then that would be reflected in its orderings. This then makes the LLM more and more likely to go along with whatever the user throws at it as this training/fine-tuning goes on. Instead of the judges liking agreeing answers more on average, it could even be a training set balance issue, where there simply were more agreeing than disagreeing possible answers. A dataset imbalanced that way has a good chance of introducing a bias towards agreeing answers into the judge model. The judge model would then pass that bias onto the GPT model it is used for to train.
Pure speculation time: since ChatGPT often produces two answers and asks the user which one the user prefers, I can only assume that the user in that case is taking the mantle of those human judges. It’s unsurprising that the average GenAI user prefers to be agreed with. So that’s also a very plausible source for that bias.