RAG API with LangChain
RAG API with LangChain | Ragex
Build a RAG API with LangChain using a managed retriever. Replace loaders, splitters, embeddings, and vector stores with one API call. Working code fast.
A RAG API with LangChain replaces the five-component retrieval pipeline most developers build from scratch. Instead of configuring document loaders, text splitters, embedding models, vector stores, and rerankers separately, you call a single search endpoint and feed the results into your LangChain chain. Setup takes minutes, not weeks.
Why LangChain Developers Need a Managed Retrieval API
LangChain is the most popular framework for building LLM applications, and for good reason. It gives you composable abstractions for prompt templates, output parsers, memory, agents, and tool use. But its RAG pipeline requires you to assemble and maintain multiple moving parts yourself.
A typical LangChain RAG setup involves PyPDFLoader or UnstructuredLoader for document parsing, RecursiveCharacterTextSplitter for chunking, an embedding model from one vendor, a vector store from another, and optionally a reranker from a third. That is five components from three to four different providers, each with their own API keys, rate limits, and failure modes.
When retrieval quality is poor, you have to debug across all five components to find the bottleneck. Is the chunking too aggressive? Is the embedding model the wrong choice for your domain? Is the vector store misconfigured? For teams building chatbot applications or customer support tools, this debugging cycle eats into time that should be spent on the LLM features users actually interact with.
Ragex collapses the entire retrieval side into one hosted service. You upload documents, call a search endpoint, and get ranked results back. Parsing, chunking, embedding, indexing, and reranking are all handled for you. Your LangChain code stays focused on what it does best: orchestrating the LLM chain.
How the Integration Works
The architecture splits cleanly between retrieval and generation. The managed API owns everything below the LLM layer, and LangChain owns everything above it.
Here is the data flow:
- Your application receives a user query.
- Your code calls Ragex's
/searchendpoint with that query. The API returns the top matching document chunks, scored and ranked by relevance. - You wrap those results as LangChain
Documentobjects. - LangChain passes the documents into your chain as context, alongside your prompt template.
- Your chosen LLM generates an answer grounded in the retrieved chunks.
The API is stateless and model-agnostic. It returns JSON search results. You control what LLM receives them, how the prompt frames them, and whether the response streams or returns in full. This same retrieval pattern powers document search applications and internal knowledge bases built with different frameworks.
Complete Working Example with RAG API with LangChain
The code below shows how to build a LangChain-compatible retriever that wraps Ragex. It implements get_relevant_documents() so you can plug it directly into any LangChain chain. The class handles authentication, sends the user query to the search endpoint, and converts the JSON response into LangChain Document objects with full metadata attached.
from langchain.schema import Document
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
import httpx
# Ragex replaces the entire retrieval pipeline
class ManagedRAGRetriever:
def __init__(self, api_key: str, kb_id: str, base_url: str = "https://api.rag.tech"):
self.client = httpx.Client(
base_url=base_url,
headers={"Authorization": f"Bearer {api_key}"}
)
self.kb_id = kb_id
def get_relevant_documents(self, query: str) -> list[Document]:
response = self.client.post(
f"/api/v1/knowledge-bases/{self.kb_id}/search",
json={"query": query, "top_k": 5, "rerank": True}
)
results = response.json()["data"]["results"]
return [
Document(
page_content=r["text"],
metadata={
"source": r["document_name"],
"score": r["score"],
**r.get("metadata", {})
}
)
for r in results
]
# Usage with LangChain
retriever = ManagedRAGRetriever(
api_key="rag_live_abc123...",
kb_id="kb_x1y2z3w4v5"
)
llm = ChatOpenAI(model="gpt-5-mini")
# That's it — retrieval is handled by the managed API
# No vector store, no embeddings, no chunker to configure
docs = retriever.get_relevant_documents("What is our return policy?")
print(f"Found {len(docs)} relevant chunks, top score: {docs[0].metadata['score']}")
This is a complete, runnable example. The ManagedRAGRetriever class makes one HTTP call per query. No vector database client, no embedding API, no document loader dependencies. If you are comparing this approach to managing your own vector infrastructure, see our Ragex vs Pinecone comparison for a detailed breakdown.
What Changes in Your Codebase
Switching to a managed retrieval API means deleting more code than you write. Here is what changes in a typical LangChain RAG application.
Code you delete:
- Document loader setup (PyPDFLoader, UnstructuredLoader, DirectoryLoader, and all their dependencies)
- Text splitter configuration (chunk size, overlap, separators)
- Embedding model selection and API key management
- Vector store provisioning and maintenance (Pinecone, Chroma, FAISS clients)
- Reranker setup and inference code
- Pipeline debugging logic for tracking down retrieval quality issues
Code you keep:
- LangChain's LLM chain orchestration
- Prompt templates and output parsers
- Memory and conversation management
- Agent and tool configurations
- Your choice of LLM (GPT-5-mini, Claude, Llama, or any other provider)
Code you add:
- One small retriever class (about 20 lines, as shown above) that calls the managed API and returns LangChain
Documentobjects
The net result is fewer dependencies, fewer API keys to manage, and a retrieval layer that improves over time without requiring code changes on your end. When better embedding or reranking approaches become available, they are deployed automatically. Your application benefits without a pull request. Teams exploring alternatives to Pinecone often find that a managed API eliminates entire categories of infrastructure work.
Common Patterns
Chat-with-Documents Using Conversational Retrieval
Many LangChain applications need multi-turn conversations grounded in a document corpus. With the managed retriever, you can plug directly into LangChain's ConversationalRetrievalChain:
from langchain.chains import ConversationalRetrievalChain
chain = ConversationalRetrievalChain.from_llm(
llm=ChatOpenAI(model="gpt-5-mini"),
retriever=retriever, # The ManagedRAGRetriever from above
return_source_documents=True,
)
chat_history = []
result = chain({"question": "What file types can I upload?", "chat_history": chat_history})
print(result["answer"])
chat_history.append((result["question"], result["answer"]))
This pattern works well for customer support chatbots where users ask follow-up questions and expect the system to remember context from earlier in the conversation.
Multi-Source Knowledge Base Queries
If your application queries across multiple knowledge bases — product docs, support tickets, and engineering wikis — you can create multiple retriever instances and merge results:
product_retriever = ManagedRAGRetriever(api_key=API_KEY, kb_id="kb_products")
support_retriever = ManagedRAGRetriever(api_key=API_KEY, kb_id="kb_support")
product_docs = product_retriever.get_relevant_documents(query)
support_docs = support_retriever.get_relevant_documents(query)
all_docs = product_docs + support_docs
# Sort by score and take the top results
all_docs.sort(key=lambda d: d.metadata.get("score", 0), reverse=True)
top_docs = all_docs[:5]
This approach is especially useful for internal knowledge base applications where information is spread across departments and systems. The managed API handles 16 file types automatically — PDFs, spreadsheets, scanned documents, and more — so each knowledge base can ingest different document formats without additional parsing code.
Getting Started
Follow these steps to add managed retrieval to your LangChain project. The entire process takes about five minutes. You will create a knowledge base, upload your documents, and wire the retriever into your existing LangChain chain. No infrastructure provisioning required.
Prerequisites:
- Python 3.10 or later
- A Ragex API key (sign up at ragex.dev)
- An existing LangChain project, or a new one
Step 1: Install dependencies
pip install langchain langchain-openai httpx
Step 2: Create a knowledge base and upload documents
Create your knowledge base with one request:
curl -X POST https://api.useragex.com/api/v1/knowledge-bases \
-H "Authorization: Bearer $RAGEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "my-docs"}'
Then upload your documents:
curl -X POST https://api.useragex.com/api/v1/knowledge-bases/KB_ID/documents \
-H "Authorization: Bearer $RAGEX_API_KEY" \
-F "file=@./my-document.pdf"
The API parses PDFs, DOCX, PPTX, XLSX, images, and plain text formats automatically. Processing takes about 4 seconds for text files and under 60 seconds for a 10-page PDF.
Step 3: Add the retriever class to your LangChain app
Copy the ManagedRAGRetriever class from the complete working example above into your project. Set your API key and knowledge base ID as environment variables.
Step 4: Use it in any LangChain chain
Replace your existing vector store retriever with the managed retriever. Your prompt templates, memory, and LLM configuration stay exactly the same.
Pricing starts at $29/mo for the Starter plan (500 pages, 5,000 queries). Pro at $79/mo handles 2,000 pages and 15,000 queries. Business at $229/mo supports 6,500 pages and 50,000 queries. Scale at $499/mo supports 15,000 pages and 120,000 queries. For most LangChain prototypes and early-stage products, Starter is enough to get to production.
If you are evaluating other framework integrations, check out our guides for the LlamaIndex integration and the Vercel AI SDK integration.
FAQ
Can I use this as a LangChain retriever?
Yes. Wrap the search endpoint in a class that implements get_relevant_documents() returning LangChain Document objects. The code sample above shows exactly how. Each search result becomes a Document with the chunk text as page_content and metadata including source document name, relevance score, page numbers, and section headings.
Does this work with LangChain's RetrievalQA chain?
Yes. Pass the ManagedRAGRetriever as the retriever argument to RetrievalQA.from_chain_type(). The chain will call get_relevant_documents() on each query, which hits the search API. You can also use it with ConversationalRetrievalChain for chat-with-docs applications — the same approach used in chatbot and document search use cases.
What about LangChain's document loaders? Do I still need them?
No. The API handles document loading, parsing (including OCR for scanned PDFs), chunking, and embedding. You upload files directly to the API instead of loading them through LangChain. This eliminates the need for PyPDFLoader, UnstructuredLoader, DirectoryLoader, and all their dependencies. The API supports 16 file types out of the box, including formats common in healthcare and other document-heavy industries.
How does retrieval quality compare to a DIY LangChain RAG pipeline?
The managed API uses a best-in-class retrieval stack with automatic embedding and cross-encoder reranking enabled by default. Most DIY LangChain setups use basic embeddings without reranking, which produces lower-quality results. With the managed approach, you get accurate, relevant search results without choosing or configuring any models yourself. As retrieval technology improves, the API adopts better approaches automatically — your search quality gets better over time without code changes on your end.
Last updated: 2026-02-20