Skip to content

AI Automation

RAG explained for business owners (without the jargon)

Andrew Roper · · 7 min read

Quick answer: RAG (retrieval-augmented generation) is the technique of giving an AI model your specific documents at the moment it answers a question, so it answers from your data rather than from its training. It’s the difference between an AI that confidently invents answers and one that quotes from your knowledge base. For most useful business AI, RAG isn’t optional — it’s the foundation.

The phrase comes up in every AI project we scope: we want the AI to know about our business. The technical answer is almost always “we’ll use RAG.” That phrase, said to a non-technical owner, is rarely useful.

This article is the explanation we’d give before quoting on a RAG-based AI system.

What RAG actually is

A general-purpose AI model has been trained on a snapshot of public text. It knows broadly about the world. It doesn’t know:

  • Your customers
  • Your products
  • Your prices
  • Your policies
  • Anything that happened after its training cutoff
  • Any document specific to your business

If you ask it a question that requires this information, two things can happen. The good outcome: the model says “I don’t know.” The bad outcome (more common): the model produces a confident-sounding answer that’s plausible and wrong.

RAG is the engineering pattern that closes this gap. The pattern, in one sentence: when a question comes in, find the most relevant documents from your knowledge base, give them to the AI alongside the question, and instruct the AI to answer only from those documents.

The acronym stands for Retrieval-Augmented Generation. “Retrieval” is the find-the-relevant-documents step. “Generation” is the AI producing an answer. “Augmented” is the fact that the AI’s capabilities are extended by the documents it’s been given.

A concrete example

A client comes to us wanting an AI assistant that can answer customer questions about their service business. The questions look like:

  • “What’s your warranty policy on installations?”
  • “Can I reschedule my appointment for next Tuesday?”
  • “How much does a callout cost outside business hours?”

Without RAG, the AI guesses. The guesses are sometimes right (warranty answers are similar across the industry). They’re sometimes wrong in ways that matter (your specific pricing isn’t industry standard).

With RAG, when a customer asks “what’s your warranty policy on installations?”, the system:

  1. Finds the document in your knowledge base that covers warranty policies (typically a specific section of a policies document)
  2. Gives that document to the AI alongside the customer’s question
  3. Instructs the AI to answer only based on what’s in the document
  4. Returns the AI’s answer, which is grounded in your actual policy

The customer gets your real warranty policy, not the AI’s impression of warranty policies in general. The model is doing what models do well (writing coherent answers) over information you’ve provided (your real policies).

The pieces a RAG system actually needs

For business owners commissioning a RAG system, the pieces that show up on the build cost:

1. Document collection. What knowledge base is the AI answering from? The more curated, accurate, and current the source documents, the better the system answers. Garbage in, garbage out applies forcefully.

2. Document chunking. Source documents are broken into smaller pieces (usually a few hundred to a few thousand words). Why: when the AI is given a relevant piece, it focuses on that piece. Giving it a 50-page document and asking it to find the relevant bit is more error-prone.

3. Embedding and indexing. Each chunk is processed by an embedding model that converts it into a mathematical representation (a vector). These vectors are stored in a special database (a vector database) that can quickly find semantically similar vectors. This is what enables “find the relevant chunk” to work.

4. Retrieval at query time. When a question comes in, it’s embedded the same way, and the most similar document chunks are pulled from the database. Typically the top 3–10 chunks.

5. Prompt construction. The retrieved chunks are formatted into a prompt for the AI model: “Here are some documents. Answer the user’s question only based on these documents. If the documents don’t answer the question, say so.”

6. The AI call itself. A model like Claude or GPT-4 generates the answer.

7. Quality controls. Validation that the answer actually came from the documents (not invented). Citations linking back to source documents. Confidence indicators when the documents don’t cover the question.

The build cost is in steps 1, 7, and the orchestration around all of it. Steps 3–6 are well-trodden technical ground.

Where RAG works well

  • Customer support over a documented knowledge base. The bread-and-butter use case. Done well, RAG handles 60–80% of routine questions accurately.
  • Internal knowledge lookup. “What’s our policy on X?” type questions across HR, IT, operations.
  • Document summarisation grounded in source material. “Summarise this contract’s key obligations.”
  • Cross-document reasoning. “Compare these three procurement contracts and highlight differences in liability terms.”
  • Onboarding and training. Letting new staff ask questions over the company’s documented playbooks.
  • Q&A over technical documentation. Engineering, scientific, or product documentation where the answers are in the docs but not easily searchable.

Where RAG breaks

RAG isn’t a universal fix. It struggles when:

  • The source documents are out of date or contradictory. RAG faithfully retrieves wrong information from wrong documents. Document hygiene is a prerequisite.
  • Questions span many documents. “Which of our 200 customers had complaints last quarter?” isn’t a RAG question — it’s a database query.
  • Reasoning matters more than retrieval. “Should we restructure our pricing model?” isn’t answered by finding documents.
  • Real-time data is required. “What’s the current inventory level?” needs a tool call to your inventory system, not a vector search over documents.
  • The knowledge base is too small. With fewer than 50 source documents, RAG is overkill — just put the documents in the prompt directly.

What a good RAG system costs

For a small business with a focused knowledge base (under a few thousand documents):

  • Build: typically $20,000–$60,000 for a serious build with quality controls and an admin interface
  • Vector database hosting: $50–$300/month
  • Embedding cost (one-off): usually $50–$500 to embed the initial document set
  • Re-embedding (ongoing): small, scales with how often documents change
  • Inference cost: usually $50–$1,000/month depending on volume
  • Maintenance and document refresh: real, not zero

For a larger or more demanding use case (cross-tenant, multilingual, regulated industry), the build can run $80,000–$250,000 with corresponding ongoing costs.

The maintenance side is often underestimated. A RAG system whose source documents go stale stops being trustworthy, and most businesses don’t have the documentation discipline that good RAG depends on. Part of the build investment is the operational layer that keeps the source material current.

When RAG is the wrong answer

Some scenarios genuinely don’t need RAG:

  • Tasks not requiring factual lookup. Drafting, summarising, classifying, extracting — the AI doesn’t need your documents for these.
  • Use cases where a simple search works fine. If your users would be served by a good full-text search, AI on top is over-engineering.
  • Use cases where the knowledge base is volatile. If documents change daily, the operational cost of keeping the index fresh may exceed the value.
  • Cases where a small fixed set of documents fits in the prompt directly. Modern models accept prompts of 100,000+ tokens. For a focused use case with a few documents, you don’t need RAG — just include the documents.

Anthropic’s contextual retrieval write-up is the clearest current resource on getting RAG quality right; the engineering effort it describes is real.

Common questions

What does RAG stand for? Retrieval-Augmented Generation. Retrieval finds the relevant documents from your knowledge base; augmentation gives those documents to the model; generation is the model producing an answer based on them.

Do I need RAG for my AI project? Yes, if the AI needs to answer questions about your specific business, products, customers, policies, or any information that isn’t in the model’s general training. No, if the AI is doing a task (drafting, classifying, extracting) that doesn’t require factual lookup of your data.

Is RAG expensive? Per-call, no — embeddings cost cents per million tokens. The expense is the engineering and infrastructure around the index: keeping it fresh, monitoring quality, handling document updates, building admin tools. Most of the cost is in the build, not the running.

Can RAG hallucinate? Less than the same model without RAG, but yes, particularly when the retrieved documents don’t actually contain the answer to the question. Good RAG systems detect this case and respond with “I can’t answer based on the available documents” rather than inventing.

How accurate is RAG? With well-curated source documents and good engineering, factual accuracy on questions covered by the documents typically exceeds 90% — comparable to a competent human reading the same documents. Accuracy degrades on questions the documents don’t cover; the system needs to recognise and acknowledge this.

How long does it take to build a RAG system? For a focused use case with a clear knowledge base, 4–10 weeks for a quality build with proper evaluation, monitoring, and admin tooling. Faster builds exist; they tend to look great in demos and break in production for the reasons covered in where AI breaks.

If you’re scoping a RAG system and want a straight answer on whether it fits your use case, start a project. Sometimes the answer is “you don’t need RAG — here’s a simpler approach.”

Let’s build something

The right system,
built once, properly.

If your business is ready to scale beyond what off-the-shelf tools can support — we should talk.