RAG implementation without LangChain

TL;DR: You do not need LangChain to implement RAG. Ragex provides document parsing, embedding, indexing, and search through direct REST calls or lightweight SDKs (Python, TypeScript). No framework abstractions, no 200+ transitive dependencies, no version conflicts. Five API calls get you from zero to search results.

Why do developers look for alternatives to LangChain?

LangChain is a popular framework for building LLM applications, but it comes with tradeoffs that frustrate many developers:

Heavy dependency tree — LangChain pulls in hundreds of transitive dependencies, which creates version conflicts and slows down builds
Abstraction overhead — simple operations like "upload a document and search it" get wrapped in chains, loaders, retrievers, and splitters that are hard to debug
Tight coupling — switching vector databases or embedding models requires changing multiple classes and configurations
Fast-moving API — breaking changes between versions are common, requiring frequent code updates

For developers who just want to add document search to an app, LangChain adds complexity without proportional value.

How do you build RAG without a framework?

The simplest path is Ragex that exposes the entire retrieval pipeline as REST endpoints. You do not need a framework to orchestrate components because there are no components to orchestrate.

from ragex import RagexClient
import time

client = RagexClient(api_key="YOUR_API_KEY")

# Create a knowledge base
kb = client.create_knowledge_base(name="Product Docs")

# Upload a document (API handles parse, chunk, embed, index)
doc = client.upload_document(kb["id"], "guide.pdf")
while doc["status"] not in ("ready", "failed"):
    time.sleep(2)
    doc = client.get_document(kb["id"], doc["id"])

# Search (with reranking enabled by default)
results = client.search(kb["id"], query="setup instructions", top_k=5)

# Pass to your LLM
context = "\n".join(r["text"] for r in results["results"])

Compare this to the LangChain equivalent, which requires a document loader, text splitter, embedding model, vector store, and retriever — each as a separate class with its own configuration. The managed API does the same thing in fewer lines with fewer dependencies.

What is the dependency footprint?

Approach	Dependencies	Install Size
LangChain + Pinecone + OpenAI	200+ packages	150-300 MB
Managed RAG API (Python SDK)	httpx only	~5 MB
Managed RAG API (TypeScript SDK)	zero (native fetch)	~50 KB

The Python SDK depends only on httpx. The TypeScript SDK has zero external dependencies — it uses native fetch available in Node.js 18+. Your application stays lean and builds stay fast.

What about advanced RAG patterns?

Managed RAG APIs handle the most common retrieval patterns out of the box:

Semantic search — natural language queries match by meaning, not keywords
Reranking — cross-encoder re-scores results for better relevance (enabled by default)
Metadata filtering — scope searches by department, version, access level, or any custom field
Multi-format parsing — 16 file types parsed automatically (PDF, DOCX, images, spreadsheets)

For patterns like query decomposition, multi-step retrieval, or agent-based RAG, you can compose API calls in plain code. A search() call is just an HTTP request — you do not need a framework to call it twice or combine results from multiple knowledge bases.

When might you still want a framework?

Frameworks add value when you need complex orchestration: multi-step agents that decide which tools to call, chains that combine multiple API calls with conditional logic, or streaming responses that interleave retrieval and generation. If your use case is "search documents and pass results to an LLM," a framework is overhead you do not need.

FAQ

Can I use Ragex with my own LLM integration code?

Yes. The API returns ranked text chunks — plain strings with relevance scores. Pass them to any LLM using whatever client library you already use (OpenAI SDK, Anthropic SDK, raw HTTP). The retrieval and generation steps are completely independent.

Does the managed API support streaming?

Search results are returned in a single JSON response, not streamed. Search latency is milliseconds, so streaming the retrieval step provides no benefit. Streaming is relevant for LLM generation, which you handle separately with your LLM provider.

What if I already have LangChain in my project?

You can use Ragex alongside LangChain or replace LangChain's retrieval components with direct API calls while keeping the rest of your LangChain setup. Migration is incremental — swap out the vector store and document loader, keep the prompt templates and chains.

Last updated: 2026-03-09