How to add knowledge base search to a chatbot

Q: What if the search returns irrelevant results?

Adjust `top_k` to control how many chunks the LLM sees (3-5 is usually ideal for chatbots). Use `score_threshold` to filter out low-relevance results. You can also use metadata filters to scope searches to specific document categories, reducing noise from unrelated content. --- Last updated: 2026-03-09

TL;DR: Upload your documents to Ragex, then on each user message, call the search endpoint to find relevant chunks and pass them as context to your LLM. This grounds the chatbot's responses in your actual documentation instead of relying on the model's training data. Setup takes under 5 minutes.

Why do chatbots need knowledge base search?

Without access to your documents, chatbots can only answer from their training data — which does not include your product docs, internal policies, or customer-specific information. Users ask questions like "what is our refund policy?" and the chatbot either hallucinates an answer or says it does not know.

Knowledge base search fixes this by retrieving relevant document chunks before generating a response. The LLM sees your actual content as context and produces accurate, grounded answers. This pattern is called retrieval-augmented generation (RAG).

How do you set it up?

1. Upload your documents

from ragex import RagexClient
import time

client = RagexClient(api_key="YOUR_API_KEY")
kb = client.create_knowledge_base(name="Support KB")

# Upload all your knowledge base documents
for file_path in ["faq.pdf", "product-guide.docx", "policies.md"]:
    doc = client.upload_document(kb["id"], file_path)

# Wait for processing (documents process concurrently)

The API parses 16 file types automatically — PDFs, DOCX, spreadsheets, images, markdown, and more. No separate parser per format.

2. Search on each user message

def get_context(user_message: str) -> str:
    results = client.search(
        kb["id"],
        query=user_message,
        top_k=3,
    )
    return "\n\n".join(r["text"] for r in results["results"])

The search endpoint returns ranked text chunks with relevance scores. Reranking is enabled by default, so the top results are sorted by a cross-encoder for better accuracy — not just vector similarity.

3. Pass context to your LLM

def answer(user_message: str) -> str:
    context = get_context(user_message)
    prompt = f"""Answer the user's question using only the context below.
If the context doesn't contain the answer, say you don't know.

Context:
{context}

Question: {user_message}"""
    # Send to OpenAI, Anthropic, or any LLM
    return call_llm(prompt)

This works with any LLM — OpenAI, Anthropic, open-source models running locally. Ragex handles retrieval; your LLM handles generation.

How do you handle multi-tenant chatbots?

If your chatbot serves multiple customers (like a SaaS help widget), create a separate knowledge base per customer. Upload each customer's documents to their own knowledge base and scope search queries to the right one:

# Tenant-aware search
results = client.search(
    tenant_kb_id,  # Each customer has their own KB
    query=user_message,
    top_k=3,
)

Alternatively, use a single knowledge base with metadata filtering. Tag documents with a customer ID at upload time and filter by it at search time. Both approaches provide full data isolation between tenants.

FAQ

Can I update the knowledge base without restarting the chatbot?

Yes. Upload new documents or delete outdated ones at any time. Changes are reflected in search results immediately once new documents reach ready status. Your chatbot code does not need any changes — it queries the same knowledge base endpoint.

How many documents can a knowledge base hold?

Document limits depend on your plan. The Starter plan at $29/mo supports enough documents for most chatbot use cases. Pro ($79/mo) and Scale ($199/mo) increase limits for larger knowledge bases. All plans include the same search quality with reranking enabled.

What if the search returns irrelevant results?

Adjust top_k to control how many chunks the LLM sees (3-5 is usually ideal for chatbots). Use score_threshold to filter out low-relevance results. You can also use metadata filters to scope searches to specific document categories, reducing noise from unrelated content.

Last updated: 2026-03-09