Before artificial intelligence can achieve widespread dominance, it must address a crucial issue: AI bots tend to generate inaccurate information.
AI-driven tools, such as ChatGPT, have captivated us with their capacity to generate human-like responses to various prompts. However, as more individuals rely on this popular technology for tasks like homework assistance, workplace research, or health inquiries, a prominent drawback is emerging: AI models frequently fabricate information.
Researchers have coined the term “hallucinations” or even “confabulations,” as Meta’s AI chief mentioned in a tweet, to describe this inclination of AI models to produce incorrect data. Some social media users have gone as far as labeling chatbots as “pathological liars.”
However, all these labels stem from our human tendency to attribute human-like actions to machines, according to Suresh Venkatasubramanian, a professor at Brown University who contributed to the White House’s Blueprint for an AI Bill of Rights.
The reality, as Venkatasubramanian explained, is that large language models, the technology at the core of AI tools like ChatGPT, are simply trained to “generate a response that sounds plausible” to user queries. “So, in that sense, any response that sounds plausible, whether it’s accurate, factual, or fabricated, is considered a valid response, and that’s what it produces,” he clarified. “There’s no inherent understanding of truth in these models.”
The AI researcher suggested a more fitting comparison for these computer-generated outputs than hallucinations or lies, which imply something being amiss or having malicious intent, would be the way his young son used to tell stories at the age of four. “You just had to ask, ‘And then what happened?’ and he would continue inventing more stories,” Venkatasubramanian recalled. “And he would keep going on and on.”
Companies developing AI chatbots have implemented certain safeguards to mitigate the most severe cases of these hallucinations. However, despite the worldwide enthusiasm for generative AI, many experts in the field are divided on whether chatbot hallucinations can be effectively resolved.
What is an AI hallucination?
In simpler terms, an AI hallucination occurs when an AI model starts generating information that is not aligned with reality, as described by Jevin West, a professor at the University of Washington and co-founder of its Center for an Informed Public.
West further explained, “But it does it with pure confidence, and it does it with the same confidence that it would if you asked a very simple question like, ‘What’s the capital of the United States?'”
This means that users can find it challenging to differentiate between what’s true and what’s not, especially when asking a chatbot about topics they don’t already know the answers to, according to West.
There have been several noteworthy instances of AI hallucinations making headlines. For instance, when Google initially unveiled a demonstration of Bard, its highly anticipated competitor to ChatGPT, the tool publicly provided an incorrect response to a question about new discoveries made by the James Webb Space Telescope. (A Google spokesperson stated at the time that this incident “underscores the importance of a rigorous testing process” and that the company was working to ensure that Bard’s responses meet high standards for quality, safety, and accuracy in real-world information.)
Additionally, a seasoned New York lawyer faced criticism when he used ChatGPT for legal research and submitted a brief containing six “bogus” cases that the chatbot appeared to have simply fabricated. News outlet CNET also had to issue corrections after an article generated by an AI tool offered wildly inaccurate personal finance advice when asked to explain the concept of compound interest.
However, taking measures to address AI hallucinations might restrict the capability of AI tools to assist individuals in more creative endeavors, such as users who ask ChatGPT to compose poetry or song lyrics.
Nevertheless, there are inherent risks associated with hallucinations when individuals rely on this technology to seek answers that could have implications for their health, voting behavior, and other potentially sensitive areas, as emphasized by West in his conversation with CNN.
Venkatasubramanian further noted that currently, using these tools for tasks where you require factual or dependable information that you cannot immediately verify yourself could pose challenges. Moreover, as this technology becomes more prevalent, there are additional potential dangers, such as companies using AI tools to summarize candidates’ qualifications and make decisions about who should proceed to the next stage of a job interview.
In Venkatasubramanian’s view, these tools should not be employed in contexts where people’s well-being could be significantly affected, at least not at the present stage of their development.
Can hallucinations be prevented?
Addressing or rectifying AI hallucinations is currently an active area of research, according to Venkatasubramanian. However, it is a highly intricate issue.
Large language models are trained on massive datasets, and multiple stages are involved in training an AI model to generate responses to user prompts. Some of this process is automated, while some parts are influenced by human intervention.
“These models are incredibly complex and intricate,” Venkatasubramanian explained. But because of their complexity, they are also quite fragile. This means that even minor changes in inputs can lead to significant changes in the output.
“That’s just the nature of these models, if something is that sensitive and complicated, it comes with this challenge,” he added. “Identifying the ways in which things can go wrong is very difficult because there are numerous small factors that can lead to issues.”
Jevin West, from the University of Washington, shared similar views, stating, “The problem is that we can’t reverse-engineer hallucinations coming from these chatbots.” He suggested that it might be an inherent characteristic of these systems that will persist.
Google’s Bard and OpenAI’s ChatGPT both acknowledge upfront to users that the tools may produce inaccurate responses and have expressed their commitment to finding solutions.
Google CEO Sundar Pichai mentioned in an interview that the hallucination problem remains unsolved in the field, and all models face this issue. He added that it’s a matter of intense debate and progress is expected.
Sam Altman, CEO of OpenAI, predicted that it would take a year-and-a-half to two years to significantly improve the hallucination problem in ChatGPT. He emphasized the need to strike a balance between creativity and perfect accuracy in AI responses.
However, Altman also humorously mentioned that he probably trusts the answers from ChatGPT the least of anyone on Earth when it comes to research applications.