RAG API with LangChain | Ragex

A RAG API with LangChain replaces the five-component retrieval pipeline most developers build from scratch. Instead of configuring document loaders, text splitters, embedding models, vector stores, and rerankers separately, you call a single search endpoint and feed the results into your LangChain chain. Setup takes minutes, not weeks.

Why LangChain Developers Need a Managed Retrieval API

LangChain is the most popular framework for building LLM applications, and for good reason. It gives you composable abstractions for prompt templates, output parsers, memory, agents, and tool use. But its RAG pipeline requires you to assemble and maintain multiple moving parts yourself.

A typical LangChain RAG setup involves PyPDFLoader or UnstructuredLoader for document parsing, RecursiveCharacterTextSplitter for chunking, an embedding model from one vendor, a vector store from another, and optionally a reranker from a third. That is five components from three to four different providers, each with their own API keys, rate limits, and failure modes.

When retrieval quality is poor, you have to debug across all five components to find the bottleneck. Is the chunking too aggressive? Is the embedding model the wrong choice for your domain? Is the vector store misconfigured? For teams building chatbot applications or customer support tools, this debugging cycle eats into time that should be spent on the LLM features users actually interact with.

Ragex collapses the entire retrieval side into one hosted service. You upload documents, call a search endpoint, and get ranked results back. Parsing, chunking, embedding, indexing, and reranking are all handled for you. Your LangChain code stays focused on what it does best: orchestrating the LLM chain.

How the Integration Works

The architecture splits cleanly between retrieval and generation. The managed API owns everything below the LLM layer, and LangChain owns everything above it.

Here is the data flow:

Your application receives a user query.
Your code calls Ragex's /search endpoint with that query. The API returns the top matching document chunks, scored and ranked by relevance.
You wrap those results as LangChain Document objects.
LangChain passes the documents into your chain as context, alongside your prompt template.
Your chosen LLM generates an answer grounded in the retrieved chunks.

The API is stateless and model-agnostic. It returns JSON search results. You control what LLM receives them, how the prompt frames them, and whether the response streams or returns in full. This same retrieval pattern powers document search applications and internal knowledge bases built with different frameworks.

Complete Working Example with RAG API with LangChain

The code below shows how to build a LangChain-compatible retriever that wraps Ragex. It implements get_relevant_documents() so you can plug it directly into any LangChain chain. The class handles authentication, sends the user query to the search endpoint, and converts the JSON response into LangChain Document objects with full metadata attached.

from langchain.schema import Document
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
import httpx

# Ragex replaces the entire retrieval pipeline
class ManagedRAGRetriever:
    def __init__(self, api_key: str, kb_id: str, base_url: str = "https://api.rag.tech"):
        self.client = httpx.Client(
            base_url=base_url,
            headers={"Authorization": f"Bearer {api_key}"}
        )
        self.kb_id = kb_id

    def get_relevant_documents(self, query: str) -> list[Document]:
        response = self.client.post(
            f"/api/v1/knowledge-bases/{self.kb_id}/search",
            json={"query": query, "top_k": 5, "rerank": True}
        )
        results = response.json()["data"]["results"]
        return [
            Document(
                page_content=r["text"],
                metadata={
                    "source": r["document_name"],
                    "score": r["score"],
                    **r.get("metadata", {})
                }
            )
            for r in results
        ]

# Usage with LangChain
retriever = ManagedRAGRetriever(
    api_key="rag_live_abc123...",
    kb_id="kb_x1y2z3w4v5"
)

llm = ChatOpenAI(model="gpt-5-mini")

# That's it — retrieval is handled by the managed API
# No vector store, no embeddings, no chunker to configure
docs = retriever.get_relevant_documents("What is our return policy?")
print(f"Found {len(docs)} relevant chunks, top score: {docs[0].metadata['score']}")

This is a complete, runnable example. The ManagedRAGRetriever class makes one HTTP call per query. No vector database client, no embedding API, no document loader dependencies. If you are comparing this approach to managing your own vector infrastructure, see our Ragex vs Pinecone comparison for a detailed breakdown.

What Changes in Your Codebase

Switching to a managed retrieval API means deleting more code than you write. Here is what changes in a typical LangChain RAG application.

Code you delete:

Document loader setup (PyPDFLoader, UnstructuredLoader, DirectoryLoader, and all their dependencies)
Text splitter configuration (chunk size, overlap, separators)
Embedding model selection and API key management
Vector store provisioning and maintenance (Pinecone, Chroma, FAISS clients)
Reranker setup and inference code
Pipeline debugging logic for tracking down retrieval quality issues

Code you keep:

LangChain's LLM chain orchestration
Prompt templates and output parsers
Memory and conversation management
Agent and tool configurations
Your choice of LLM (GPT-5-mini, Claude, Llama, or any other provider)

Code you add:

One small retriever class (about 20 lines, as shown above) that calls the managed API and returns LangChain Document objects

The net result is fewer dependencies, fewer API keys to manage, and a retrieval layer that improves over time without requiring code changes on your end. When better embedding or reranking approaches become available, they are deployed automatically. Your application benefits without a pull request. Teams exploring alternatives to Pinecone often find that a managed API eliminates entire categories of infrastructure work.

Common Patterns

Chat-with-Documents Using Conversational Retrieval

Many LangChain applications need multi-turn conversations grounded in a document corpus. With the managed retriever, you can plug directly into LangChain's ConversationalRetrievalChain:

from langchain.chains import ConversationalRetrievalChain

chain = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(model="gpt-5-mini"),
    retriever=retriever,  # The ManagedRAGRetriever from above
    return_source_documents=True,
)

chat_history = []
result = chain({"question": "What file types can I upload?", "chat_history": chat_history})
print(result["answer"])
chat_history.append((result["question"], result["answer"]))

This pattern works well for customer support chatbots where users ask follow-up questions and expect the system to remember context from earlier in the conversation.

Multi-Source Knowledge Base Queries

If your application queries across multiple knowledge bases — product docs, support tickets, and engineering wikis — you can create multiple retriever instances and merge results:

product_retriever = ManagedRAGRetriever(api_key=API_KEY, kb_id="kb_products")
support_retriever = ManagedRAGRetriever(api_key=API_KEY, kb_id="kb_support")

product_docs = product_retriever.get_relevant_documents(query)
support_docs = support_retriever.get_relevant_documents(query)

all_docs = product_docs + support_docs
# Sort by score and take the top results
all_docs.sort(key=lambda d: d.metadata.get("score", 0), reverse=True)
top_docs = all_docs[:5]

This approach is especially useful for internal knowledge base applications where information is spread across departments and systems. The managed API handles 16 file types automatically — PDFs, spreadsheets, scanned documents, and more — so each knowledge base can ingest different document formats without additional parsing code.

Getting Started

Follow these steps to add managed retrieval to your LangChain project. The entire process takes about five minutes. You will create a knowledge base, upload your documents, and wire the retriever into your existing LangChain chain. No infrastructure provisioning required.

Prerequisites:

Python 3.10 or later
A Ragex API key (sign up at ragex.dev)
An existing LangChain project, or a new one

Step 1: Install dependencies

pip install langchain langchain-openai httpx

Step 2: Create a knowledge base and upload documents

Create your knowledge base with one request:

curl -X POST https://api.useragex.com/api/v1/knowledge-bases \
  -H "Authorization: Bearer $RAGEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-docs"}'

Then upload your documents:

curl -X POST https://api.useragex.com/api/v1/knowledge-bases/KB_ID/documents \
  -H "Authorization: Bearer $RAGEX_API_KEY" \
  -F "file=@./my-document.pdf"

The API parses PDFs, DOCX, PPTX, XLSX, images, and plain text formats automatically. Processing takes about 4 seconds for text files and under 60 seconds for a 10-page PDF.

Step 3: Add the retriever class to your LangChain app

Copy the ManagedRAGRetriever class from the complete working example above into your project. Set your API key and knowledge base ID as environment variables.

Step 4: Use it in any LangChain chain

Replace your existing vector store retriever with the managed retriever. Your prompt templates, memory, and LLM configuration stay exactly the same.

Pricing starts at $29/mo for the Starter plan (500 pages, 5,000 queries). Pro at $79/mo handles 2,000 pages and 15,000 queries. Business at $229/mo supports 6,500 pages and 50,000 queries. Scale at $499/mo supports 15,000 pages and 120,000 queries. For most LangChain prototypes and early-stage products, Starter is enough to get to production.

If you are evaluating other framework integrations, check out our guides for the LlamaIndex integration and the Vercel AI SDK integration.

FAQ

Can I use this as a LangChain retriever?

Yes. Wrap the search endpoint in a class that implements get_relevant_documents() returning LangChain Document objects. The code sample above shows exactly how. Each search result becomes a Document with the chunk text as page_content and metadata including source document name, relevance score, page numbers, and section headings.

Does this work with LangChain's RetrievalQA chain?

Yes. Pass the ManagedRAGRetriever as the retriever argument to RetrievalQA.from_chain_type(). The chain will call get_relevant_documents() on each query, which hits the search API. You can also use it with ConversationalRetrievalChain for chat-with-docs applications — the same approach used in chatbot and document search use cases.

What about LangChain's document loaders? Do I still need them?

No. The API handles document loading, parsing (including OCR for scanned PDFs), chunking, and embedding. You upload files directly to the API instead of loading them through LangChain. This eliminates the need for PyPDFLoader, UnstructuredLoader, DirectoryLoader, and all their dependencies. The API supports 16 file types out of the box, including formats common in healthcare and other document-heavy industries.

How does retrieval quality compare to a DIY LangChain RAG pipeline?

The managed API uses a best-in-class retrieval stack with automatic embedding and cross-encoder reranking enabled by default. Most DIY LangChain setups use basic embeddings without reranking, which produces lower-quality results. With the managed approach, you get accurate, relevant search results without choosing or configuring any models yourself. As retrieval technology improves, the API adopts better approaches automatically — your search quality gets better over time without code changes on your end.

Last updated: 2026-02-20