RAG, web search, fine-tuning or rules: choosing for an LLM system

A guide to choosing between RAG, web search, fine-tuning and deterministic rules when building assistants and LLM automations.

When an AI assistant answers incorrectly, a common reaction is to assume that it needs a more powerful model. Often, the real problem is elsewhere: the system does not have the right knowledge, uses outdated information or asks the LLM to make a decision that should be defined by rules.

RAG, web search, fine-tuning and deterministic logic solve different problems. Choosing correctly between them often adds more value than switching providers or writing a much longer prompt.

Direct answer: when to choose each approach

I would use RAG when the assistant must consult private documents or internal knowledge that changes over time.
I would use web search when it needs public, recent information that can be verified through sources.
I would use fine-tuning when I want to consistently change model behaviour, format or style using many high-quality examples.
I would use deterministic rules when a decision must be predictable, auditable and always enforced.
I would combine techniques when the system needs knowledge, reasoning and real actions.

Need	First option
Answer questions about internal manuals	RAG
Research recent public news or data	Web search
Repeat a specialised format consistently	Fine-tuning
Validate permissions, limits or business conditions	Rules
Execute a complete business process	Hybrid architecture

The important question is not "which technique is best", but which part of the problem each one is meant to solve.

Before choosing: separate knowledge, behaviour and decisions

Many AI projects mix three needs:

Knowledge: which information the model needs to answer.
Behaviour: how it should interpret and write.
Decision: which actions are allowed and under which conditions.

RAG and web search mainly provide knowledge. Fine-tuning mainly changes behaviour. Rules control decisions.

This separation prevents asking a single technique to do things it was not designed for. For example, training a model on an internal policy does not guarantee it will always apply the latest version. If that policy changes, it is usually better to retrieve it from a current source and validate its conditions through code.

What RAG is and when it adds value

RAG means Retrieval-Augmented Generation. Before generating an answer, the system searches for relevant passages in a document collection and provides them to the LLM as context.

A simplified architecture looks like this:

Question
    -> document search
        -> relevant passages
            -> LLM generates an answer with context

RAG is a good fit for:

internal manuals;
technical documentation;
business procedures;
knowledge bases;
product catalogues and specifications;
contracts or regulations with controlled access;
histories that must be queried under permissions.

Its main advantage is updating knowledge without retraining the model. If a document changes, it can be processed again so the next query can retrieve the new version.

RAG is not simply storing PDFs in a vector database

Quality depends on many decisions:

which documents are included;
how they are cleaned and split;
which metadata is preserved;
how permissions are applied;
which search method is used;
how many passages are retrieved;
how they are ranked and deduplicated;
how sources are cited.

If the retriever returns irrelevant passages, the LLM will answer from poor context. Before evaluating the final writing, it is worth measuring whether the system found the correct information.

When to use web search

Web search is appropriate when the required knowledge is public, external and may have changed recently. Examples include:

news;
corporate websites;
market information;
updated public documentation;
academic publications;
registries and official sources;
OSINT research.

Unlike an internal RAG collection, the web is not controlled. Sources may contradict each other, be outdated or attempt to manipulate the agent. The result should therefore preserve links, dates, evidence and confidence.

In my guide to OSINT with LLMs and web search, I explain this principle in more detail: a model can accelerate research, but every important conclusion should be traceable back to its source.

RAG and web search are not opposites

An enterprise assistant can use both:

RAG to consult private policies and documents;
web search to obtain recent public context;
rules to decide which sources or actions are allowed.

The answer should clearly distinguish what comes from internal knowledge and what comes from an external source.

When fine-tuning makes sense

Fine-tuning adjusts a model using examples so that it reproduces a particular behaviour more effectively. It can be useful when many high-quality input-output pairs exist and the objective is consistent.

I would consider it for:

specialised, repetitive classification;
highly specific output formats;
stable tone or style;
domain vocabulary;
tasks where a tuned smaller model can replace a more expensive general model;
behaviours that have already proved successful through prompting.

I would not use it as the first solution for keeping facts current. Knowledge incorporated during training is not a database that can be easily edited, cited or deleted.

It also requires a genuinely good example set. Tuning on inconsistent data only makes the model repeat inconsistencies more confidently.

Before fine-tuning

I would first check:

whether a clear prompt solves the problem;
whether structured output improves consistency;
whether examples or context are missing;
whether the current evaluation actually measures the task;
whether volume justifies preparing data, training and maintaining versions.

Fine-tuning should answer a measured limitation, not a feeling that the model "could be more trained".

When rules are the best tool

Deterministic rules are less eye-catching, but they are essential. I would use them when a condition must always hold:

permissions and authorisation;
spending limits;
schedules and capacity;
required fields;
calculations;
state transitions;
duplicate prevention;
enforcement of explicit policies.

An LLM can interpret that a user wants to cancel a reservation. A rule must verify that the reservation exists, belongs to that user, can be cancelled and what consequences the action has.

LLM interprets intent
    -> rules validate conditions
        -> service executes action
            -> system records the result

This division also appears in my article about reliable production AI automations: AI provides flexibility, but real operations require contracts, state and idempotency.

Example: an internal company assistant

Imagine an assistant that helps employees answer questions and request internal actions.

RAG

It retrieves procedures, manuals and FAQs according to the employee's department and permissions.

Web search

It checks public vendor documentation or recent external information when policy allows it.

Fine-tuning

It could be used if thousands of reviewed examples exist for classifying internal requests with a highly specific taxonomy.

Rules and services

They verify identity, permissions and required fields, then execute authorised actions.

The result is not merely "a chatbot with RAG". It is a system where each component has a clear responsibility.

Reference hybrid architecture

User request
    -> classification and interpretation
        -> retrieve internal documents with RAG
        -> search public sources when necessary
        -> generate answer or propose a tool
            -> validate through rules
                -> execute or escalate to a person

This architecture keeps knowledge and decisions separate:

documents can be updated;
external sources remain cited;
model behaviour can be evaluated;
rules can be tested;
actions leave an audit trail.

How I would evaluate each component

I would not use a single metric for the whole system.

Component	Evaluation questions
RAG	Does it retrieve the correct document? Does the passage contain the answer?
Web search	Does it prioritise good sources? Do citations support claims?
Fine-tuning	Does it improve the task over the base model? Does it generalise?
Rules	Do they cover limits and edge cases? Are they consistent and testable?
Final answer	Is it correct, clear, useful and honest about its limits?

For RAG, I would measure retrieval first: precision, coverage and passage ranking. An incorrect answer may mean that the model reasoned poorly or that it never received the right document.

For fine-tuning, I would keep an evaluation set separate from training data. Testing on examples the model has already seen can look excellent without demonstrating real ability.

Common mistakes

Using RAG for data that should come from an API

If I need a current balance, order status or exact availability, I would query the official system. RAG is for documentary knowledge, not a replacement for transactional data.

Using fine-tuning to memorise changing information

Updating documents and retrieving their latest version is usually more controllable than retraining.

Allowing web search to decide without verifying sources

Finding a page does not make its content true. Important claims need provenance and cross-checking.

Hiding rules inside the prompt

An instruction such as "never exceed this limit" does not replace a code validation when the limit has real impact.

Combining everything before measuring the simple approach

Every component adds cost and complexity. I would start with the minimum architecture that makes the problem measurable and add pieces only when they solve an observed limitation.

My final criterion

RAG, web search, fine-tuning and rules do not compete to do exactly the same thing:

RAG delivers private, controlled knowledge;
web search provides public, recent information;
fine-tuning specialises repeatable behaviour;
rules protect decisions and actions.

The right architecture often combines some of them, but it does not need all of them.

The best LLM system is not the one with the most techniques, but the one that knows where probability ends and certainty must begin.

You can also read my guide to choosing between n8n, FastAPI and Spring or review the AI systems I have built for NexaVision AI. For the quality layer, I also explain how I evaluate AI agents before production.

To discuss agents, automation and AI architecture, contact me through the contact page.