Written by Scott Wilson
For all that generative artificial intelligence models have exceeded expectations for creativity, conversation, and relevance, there is one big glaring problem hanging over them: they sometimes lie like fiends.
When posed questions that have objective, factual, and verifiable answers, somewhere between three and ten percent of the time generatively pre-trained large language models (LLMs) will confidently present an incorrect answer. At times, they will present elaborate defenses of that answer, even if they are given the opportunity to correct the statement.
At a time when many industries are going all in on chatbots for use in customer service, corporate communications, appointment scheduling, and other routine tasks, the hallucination problem is a significant obstacle. While it’s funny if ChatGPT spends a few paragraphs elaborating how Mahatma Gandhi made extensive use of Gmail and Google Docs to organize his mid-20th century resistance to British rule, made-up answers aren’t a laughing matter if they involve medical or legal advice.
The presence of hallucinations in today’s foundation models has implications for every kind of LLM-driven solution.
If AI is going to have a future as a useful part of American industry, it’s going to have to learn not to lie.
And AI engineers will have to be the ones to teach it.
All About AI Hallucinations
For starters, hallucination has proven to be a loaded word for describing what is going on with these errors.
Hallucination: An experience involving the apparent perception of something not present
~ New Oxford American Dictionary
Large language models, of course, lack any ability to perceive; they do not possess consciousness or understanding. So they really aren’t wired to experience hallucinations. What’s happening is more on the order of a bug in a computer program.
Except these aren’t programs in the ordinary sense of that word, either. Generative AI models are a collection of parameters used to perform vector space calculations for input tokens to develop completions. For example, given a stream of words, an LLM will predict the next most likely word, and the next, and so on until a complete sentence is formed.
Those calculations don’t incorporate any kind of true/false value judgement, though. Instead, they are entirely about the language itself. It’s why LLMs are just as happy to write fiction as cough up historical facts.
In some real sense, a hallucinatory output is the most likely response to the input the model is given… even if it’s patently untrue.
That makes it impossible to definitively say where hallucinations come from. There’s no bug to hunt down or glitch to fix. You won’t find a line of code that controls lying. There’s no variable that is mistakenly set to allow hallucinations. In the realm of statistical calculations the neural network is making, there hasn’t really been any error at all.
It’s the very ability of LLMs to deal with linguistic ambiguity that makes them magical. But buried in that ability to offer flexible interpretation is the ability to lie.
Where Do AI Hallucinations Come From?
Even if it’s not particularly accurate, the field came by the term hallucination honestly in the first place, however. The earliest uses of the term were describing instances in computer vision processing where algorithms might intentionally be designed to infer missing data in low-quality imagery… a process known as superresolution or inpainting. It was useful to have the algorithm make up things that weren’t there, if they enhanced the picture.
Yet the fact that the machine was inventing something that was not present in the data was a close match for what happened with LLMs, as well… with less positive consequences.
Any kind of summary inevitably loses information along the way.
Some researchers believe that in many circumstances, hallucinations in LLMs aren’t that big a deal. They may help the conversation feel more natural, or represent a sort of compression that is also present in statements made by humans. In the same way we say “The sky is blue” as a sort of consolidated shorthand expressing of the effects of Rayleigh scattering as interpreted by the human optic system, AI may offer a response that is not strictly word-for-word true but gets the point across.
Yet hallucination turned into more and more of a problem the more sophisticated the models became. The conversational skills of a modern LLM makes misstatements more difficult to detect and weed out. And the complexity of the underlying model makes it harder and harder to figure out how to tune hallucinations out.
AI Hallucinations Come in Different Types
Reflecting both the complexity and the ambiguity of AI hallucinations is the difficulty that researchers have in pinning them down.
Hallucinations are generally described as either intrinsic or extrinsic.
- Intrinsic AI hallucinations are generated statements that directly contradict source material provided in the model training data.
- Extrinsic AI hallucinations are statements that have no basis at all in the training data. These statements may be factual, but the AI has no way to validate them.
hese issues pop up in all different aspects of natural language processing AI:
Abstractive summaries – AI given a large amount of information to summarize may make up a summary that includes information not in the source, or may otherwise misstate data that was present in the source.
- Dialogue generation - AI uses dialogue to query or respond to users performing specific tasks, like appointment booking. Hallucinations tend to result in nonsensical questions or statements. In other cases, the AI may contradict itself from statement to statement.
- Generative question answering - This is like abstractive summarization in that it often involves compressing data to present a direct answer to a question. Hallucinations here usually come out as factual inaccuracies in the answers
- Data to text generation - Used primarily in image description or descriptions of structured data, such as in spreadsheets, hallucinations here often involve misidentifying objects present in the original data or inaccurately describing information.
- Translation - Translation hallucinations often come out not just as mistakes in translation, but in statements that are almost entirely unrelated to the input text.
Given that some of these are areas where there are no prizes for creativity, you might imagine that it would be easy to sacrifice some logic and reasoning skill for more straightforward answers. But the potential for hallucination is baked into neural networks, at least using the techniques we have for generating them today.
AI engineers will have to look elsewhere for solutions to the hallucination problem.
How AI Engineers Are Working to Reduce the Hallucination Problem in Artificial Intelligence
While the potential for hallucination seems to be built into the machine learning techniques behind LLMs, there are ways to reduce the likelihood. For example, it was discovered early on that overfitting models resulted in greater chances they would generate incoherent results.
As the use of AI has become more widespread, organizations working on LLMs have jacked up the priority on dealing with hallucinations. Today, AI engineers are focused on several promising areas to combat hallucination issues.
Improving the Training Data for AI
High-quality training data is the holy grail for developing generative systems today for many reasons. It turns out that it’s also one of the major ways that AI engineers can control hallucinations in their models.
Of course, it naturally follows that limited training data is likely to increase the likelihood of extrinsic hallucination. An AI that hasn’t been provided with certain facts can’t be expected to regurgitate them; any conversation outside its limited range is likely to result in invented responses.
The cleaner and clearer the training data, the less likely a large-language model will develop either type of hallucinations.
But it also turns out to be the case that the structure and relationships of the data provided play a role in preventing even intrinsic hallucinations. For example, some training methods use a source/target method of machine learning, using a specific set of information as a target for unsupervised training. If the source and target data have a mismatch of information, even if all of it is valid, the resulting models are more likely to hallucinate.
Better Training Systems Reward Less Hallucinatory Performance
It also matters how the training data is used to build models. OpenAI has come up with a method called process supervision that uses unsupervised reinforcement learning to offer partial rewards as the model works through a particular problem.
Even reinforcement learning with human feedback has been found to drop hallucination rates in models.
Current reinforcement learning has focused on offering a reward at the final completion of an answer. But the thinking is that it may be easier to build up to correct statements if they are checked along the way. In the same way that behavioralists don’t jump straight into teaching a dog to jump through a flaming hoop for a circus trick, but rather start with small rewards as the dog climbs the ramp, jumps without a hoop, and then jumps through the hoop without fire, it’s possible AI may get the picture in smaller training chunks.
This also mirrors a technique being used extensively by AI prompt engineers in the field today. Chain-of-thought prompting walks AI through queries step-by-step, building up a complete answer by breaking the question down into smaller pieces.
Better Coding Processes
No, not the lines of Python that you use to put together your ML algorithms… this is about the encoder/decoder routines that takes input text and translates it into fixed-length vectors for calculation and back again to plain English.
Some researchers have found that decoding processes that emphasize higher layers of the model tends to boost relevant factual information. Since semantics are more intermediate and low-level layer concerns, it makes sense that giving the high levels more emphasis could reduce hallucinations.
There’s also some suggestion that iterative decoding with measures to reduce entropy at each stage may reduce hallucinatory output.
Post-processing Hallucination Correction
Finally, and possibly ultimately, stacking error-correcting code over the output from language models allows AI engineers to bring other and more traditional coding tools to bear on the hallucination problem.
These types of systems used tried and true methods from computer science to evaluate and correct generative AI statements. For example, a post-processing system might sit between a chatbot and a human user and scan the output for recognizable data types like URLs, dates, or phone numbers. It can then check in the background to determine the validity of that data. A hallucination will either get stripped out, corrected, or sent back to the AI for it to try again.
However, this method also comes with a host of problems. When you use a dumb translator for a smart AI system, it can wring a lot of the magic out of the output.
Becoming Part of the AI Hallucination Solution Takes Research
It’s also very likely that there are new and innovative methods being worked out by teams at the industry-leading AI companies to deal with hallucinations. Because of the competitive nature of the field, though, it’s unlikely you’ll ever learn about them unless you happen to be hired by that particular company. The team that produces a non-hallucinating generative AI model will surely reap huge benefits in the market.
The best way for the average AI engineer to get a handle of hallucinations is through advanced studies in artificial intelligence. Top college programs and PhD researchers are digging into the issues, too. At the right school, you take part in developing solutions to hallucinations… and be one of the most popular job candidates on the market.