RAG Done Right: Avoiding the Common Mistakes

What RAG Actually Is

RAG - retrieval-augmented generation - is the pattern where the model answers questions using passages retrieved from your data, not its training. Done right: factual, grounded, citation-backed answers. Done wrong: a chatbot that hallucinates with extra steps.

Data Preparation

Most RAG failures start before any vector is computed. Your source data has duplicates, conflicting versions, outdated pages, and PDFs full of garbage. Clean and canonicalize first. The fancy retrieval can't fix bad data.

Chunking Strategy

Default token-based chunking destroys context. Use semantic chunking: break on H2/H3, paragraphs, or list items. Each chunk should contain one coherent idea. Add metadata (document title, section, last-updated) to every chunk.

Embedding Choice

Modern embedding models are commoditized. OpenAI, Cohere, open-source options all perform similarly for general text. Pick one, stick with it, evaluate recall on your specific corpus.

Retrieval: Hybrid Search Always

Pure semantic search misses exact matches (SKUs, product names). Pure keyword search misses paraphrases. Combine: get top-K from each, merge, re-rank. Recall typically jumps 30–40 points.

The Production RAG Stack

• Cleaned, canonicalized source data.
• Semantic chunking with rich metadata.
• Hybrid retrieval (semantic + keyword).
• Re-ranker on top-20.
• Citation-enforced generation.
• Eval suite running weekly.

Re-ranking

A small re-ranker on top-20 retrieved chunks produces meaningfully better top-5. Cross-encoders (like Cohere's rerank or open-source alternatives) are cheap and effective.

Generation With Citations

The system prompt must require citation per claim. The model is forbidden from inventing facts not in the retrieved chunks. If the retrieved chunks don't support the question, the model must say so.

Top 7 Mistakes

Throwing all your data in without cleaning.
Default chunking sized for the model, not for the ideas.
Pure semantic search with no keyword fallback.
No re-ranking layer.
Generation prompt that doesn't enforce citations.
No eval suite to catch regressions.
Stale data - no freshness pipeline.

A well-built RAG system is boring. It answers questions accurately, cites sources, refuses when uncertain, and stays up-to-date. The fancy version most builders try first is the opposite of boring - and the opposite of working.

See AI knowledge bases without hallucination.

FAQ

Vector DB choice? pgvector if you're on Postgres. Pinecone or Weaviate at scale.

How big a corpus before RAG is worth it? Above ~50 documents. Below that, just include them in context.

Cost? Embedding is cheap. Storage is cheap. The expensive part is the generation call - same as without RAG.

RAG Done Right: Avoiding the Common Mistakes

What RAG Actually Is

Data Preparation

Chunking Strategy

Embedding Choice

Retrieval: Hybrid Search Always

Re-ranking

Generation With Citations

Top 7 Mistakes

FAQ

Want to make something like this real for your business?

Flowtix Team

Keep reading.

Why 87% of AI Implementations Fail - And What the 13% Do Differently

What Is an AI Agent and Why Does Your Business Need One in 2025?

The AI Implementation Roadmap for Small Businesses (Step by Step)