How Mav Cognitive Prevents AI Hallucinations

One of the biggest challenges when implementing an AI solution with Large Language Model (LLM) is preventing LLM hallucinations. For industries where accuracy and reputation is key, hallucinations can create major issues. So what do you do?

Here's a bit about how we're approaching it with Mav Cognitive.

The Limitations of RAG

Obviously, RAG is important—without it, you wouldn’t be able to inject domain knowledge (industry, company, playbook, and session info) and increase semantic relevance scores. However, relying on prompt engineering alone, no matter the level of expertise, is going to prevent an LLM like GPT, Cohere, Llama, etc. from generating inaccurate, nonsensical, or detached text at least some of the time. This might be fine if you’re building a free consumer app, but for Mav, this is not acceptable for us, or our customers. In insurance, finance, and legal, factual, sensible, and relevant conversational copy is a hard requirement. This is why we've built our own “guardrails” into Mav.

Mav Guardrails

These guardrails are in our Mav Cognitive responses library, which is managed at an industry, customer, and playbook level, and co-trained by us and our customers. It essentially tells the AI, “Mr. AI, you’re allowed to answer these questions, and generally answer them like this.” From a technical perspective, our own platform pair-matches an out-of-context consumer message with what we have trained and stored via a vector search (with an adjustable threshold). If there isn’t a match (within the threshold), we don’t even send the RAG command to the foundation model. This is what prevents Mav from answering irrelevant questions (like “Who went to the moon?”) when we’re trying to deliver, say, an insurance quote. If there is a match (within threshold), we execute a RAG command to a fine-tuned foundation model. Different models are better for different use cases, and they will come back with a relevant, cognitive response, and then redirect to the current state of the playbook to provide the curated, "on script" experience.

We continue to collaboratively tune our responses and add new ones to our library based on tens of thousands of conversations.

Not only are we continuing to evolve and improve our Cognitive AI tooling, our future product roadmap will continue to push the boundaries of what's possible in AI-driven experiences. We're excited about that future.

Matthew Black

Aug 29, 2024