Writers logo

Chatbots May ‘Hallucinate’ More Often Than Many Realize

Business Article

By Mushfiqur Ahmed Published 2 years ago 4 min read
Chatbots May ‘Hallucinate’ More Often Than Many Realize
Photo by Mariia Shalabaieva on Unsplash

When OpenAI's ChatGPT chatbot was introduced, it impressed millions with its humanlike responses and ability to discuss various topics. However, it became apparent that this chatbot often fabricates information.

Subsequently, Google and Microsoft also released their own chatbots, which similarly provided false information. ChatGPT even cited fake court cases in a legal brief submitted to a federal judge.

To address this issue, a new startup called Vectara, founded by former Google employees, is attempting to determine the frequency of chatbots deviating from the truth. Their research suggests that even in controlled scenarios, chatbots invent information at a minimum of 3 percent and potentially as high as 27 percent of the time.

This behavior, known as "hallucination," poses a significant problem for those utilizing chatbot technology with sensitive data such as court documents, medical information, or business data.

Due to the vast number of ways in which chatbots can respond to requests, it is impossible to accurately determine the extent of their hallucination. Simon Hughes, the lead researcher at Vectara, explained that examining all available information in the world would be necessary to achieve such certainty.

In an effort to assess the chatbots' accuracy, Dr. Hughes and his team tasked them with summarizing news articles, a task that can be easily verified. However, even in this scenario, the chatbots consistently generated invented information.

Amr Awadallah, the CEO of Vectara and a former Google executive, expressed concern over the chatbots' ability to introduce errors despite being provided with a set of facts.

Overall, the prevalence of hallucination in chatbot behavior raises significant concerns for those relying on this technology for critical purposes.

The researchers contend that when chatbots engage in tasks beyond mere summarization, the rates of hallucination may increase.

Their study also revealed significant variations in hallucination rates among leading A.I. companies. OpenAI's technologies exhibited the lowest rate, approximately 3 percent. Meta, the parent company of Facebook and Instagram, recorded a rate of around 5 percent. Anthropic, a San Francisco-based OpenAI competitor, reported the highest rate at over 8 percent with their Claude 2 system. Palm chat, a Google system, had the highest rate at 27 percent.

Sally Aldous, a spokesperson for Anthropic, emphasized the company's commitment to ensuring their systems are helpful, honest, and free from hallucinations.

Google chose not to provide a comment, while OpenAI and Meta have yet to respond to comment requests.

Dr. Hughes and Mr. Awadallah aim to use this research to caution individuals about the reliability of information derived from chatbots and even the services offered by Vectara to businesses. Many companies now offer similar technologies for business purposes.

Vectara, a start-up based in Palo Alto, California, has received $28.5 million in seed funding and employs a team of 30 individuals. Amin Ahmad, one of its founders and a former Google artificial intelligence researcher, has been involved in this technology since 2017, when it was incubated within Google and several other companies.

Similar to Microsoft's Bing search chatbot, Vectara's service can retrieve information from a company's private collection of emails, documents, and other files.

The researchers also hope that by publicly sharing their methods and continuously updating them, they can encourage the industry as a whole to address and minimize hallucinations. OpenAI, Google, and other companies are actively working towards mitigating this issue through various means.

Chatbots such as ChatGPT operate on the basis of a sophisticated technology known as a large language model (L.L.M.). This L.L.M. acquires its proficiency through the analysis of vast quantities of digital text, encompassing books, Wikipedia articles, and online chat logs. By discerning patterns within this extensive data, the L.L.M. becomes adept at predicting the subsequent word in a given sequence.

Due to the prevalence of misleading information on the internet, these systems occasionally propagate inaccuracies. Additionally, they rely on probabilities to determine the likelihood of the next word being a specific term, such as "playwright." However, there are instances where their predictions prove incorrect.

The recent research conducted by Vectara sheds light on the occurrence of such inaccuracies. When generating summaries of news articles, chatbots refrain from reproducing falsehoods found elsewhere on the internet. Nevertheless, they may still err in their summarization.

This phenomenon also demonstrates the reason why Microsoft's Bing chatbot can make errors when retrieving information from the internet. When a question is posed to the chatbot, it utilizes Microsoft's Bing search engine to conduct an internet search. However, it lacks the ability to precisely identify the correct answer. Instead, it extracts the results of the internet search and provides a condensed version for the user.

Occasionally, this condensed summary is highly flawed. Some chatbots may even reference fabricated internet addresses.

To enhance the accuracy of their technologies, companies like OpenAI, Google, and Microsoft have implemented various methods. For instance, OpenAI seeks to refine its technology by gathering feedback from human testers who assess the chatbot's responses, distinguishing between useful and truthful answers and those that are not. Subsequently, through a process known as reinforcement learning, the system spends weeks analyzing these ratings to gain a better understanding of factual information versus fictional content.

Nevertheless, researchers caution that resolving chatbot hallucination is a complex challenge. Due to their reliance on data patterns and probabilistic operations, chatbots occasionally exhibit undesired behavior.

To determine the frequency of chatbot hallucinations when summarizing news articles, Vectara's researchers employed another extensive language model to verify the accuracy of each summary. This approach proved to be the most efficient method for evaluating such a vast number of summaries.

CommunityWriting Exercise

About the Creator

Mushfiqur Ahmed

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments (1)

Sign in to comment
  • Salman siddique2 years ago

    nice work bro

Find us on social media

Miscellaneous links

  • Explore
  • Contact
  • Privacy Policy
  • Terms of Use
  • Support

© 2026 Creatd, Inc. All Rights Reserved.