You can’t make it detect it or promise not to make it.
This is how you know these things are fucking worthless because the people in charge of them think they can combat this by using anti hallucination clauses in the prompt as if the AI would know how to tell it was hallucinating. It already classified it as plausible output by creating it!
This is how you know these things are fucking worthless because the people in charge of them think they can combat this by using anti hallucination clauses in the prompt as if the AI would know how to tell it was hallucinating. It already classified it as plausible output by creating it!
They try to do security the same way, by adding “pwease dont use dangerous shell commands” to the system prompt.
Security researchers have dubbed it “Prompt Begging”