11 January 2026

LLMs dont hallucinate. Bad embedding does.

LLMs, RAG, and the Real Source of “Hallucinations”

Recently I built a small RAG system end-to-end as a personal project.
The goal was simple: take my extended CV, embed it into a vector database (Chroma), and make it searchable through natural language.

The architecture is straightforward:

  • The frontend sends a question

  • The backend embeds the query

  • Vector search retrieves relevant chunks

  • The LLM uses those results to generate the final answer

This is, in essence, how most RAG systems work today.

And this is where an interesting observation emerged.



Do LLMs really hallucinate?

We often say that LLMs hallucinate.
Anyone who has used ChatGPT has seen answers that are vague, misleading, or simply wrong.

But after building and debugging a real RAG system, I came to a different conclusion:

In many RAG systems, what we call “LLM hallucination” is actually a retrieval problem, not a model problem.

LLMs are trained on massive datasets, and modern models are already surprisingly reliable.
Yes, they are probabilistic systems and will never be perfectly deterministic — but in practice, the biggest source of wrong answers in RAG systems is not the model.

It is the data pipeline feeding the model.


What went wrong in my system

I was embarrassed that my system could not answer a simple question:

“Did I work for company X?”

At first glance, this looked like classic hallucination.

But the problem was not the LLM. It was the data I was feeding to the LLM.

After checking the logs, I realized that the structure of my data was weak:

  • important parts were missing

  • documents did not contain enough context

  • vector search results were off

  • metadata was too thin

The LLM was not the issue.
It was simply given bad input — and no model can compensate for that.


The real fix for “hallucinations” in RAG

The problems in my system were not fixed by:

  • better prompts

  • better LLMs

  • more tokens

  • tweaking temperature

They were fixed by:

  • better data structure

  • richer context in documents

  • proper observability of retrieval results

I enriched every chunk with clear context, for example:

  • Section: Experience

  • Subsection: Company A

  • Period: September 2020 – August 2022

Once the vector search started returning properly contextualized results, the so-called hallucinations almost disappeared.

Same model.
Same prompt.
Different data quality.