RAG API for document search
RAG API for Document Search | Ragex
Add accurate document search to your app with Ragex for document search. Supports 16 file types with reranking included. First query in 5 minutes.
Ragex search replaces keyword matching with semantic retrieval across your entire document library. Upload files in any of 16 supported formats, call the search endpoint with a natural-language query, and get ranked results your app can display directly or pass to an LLM for answer generation.
The Document Search Problem
Traditional keyword search fails the moment users phrase queries differently from the text in their files. A developer searching for "how to cancel a subscription" gets zero results because the documentation says "account termination procedure." A legal team querying "data breach notification requirements" misses the relevant clause titled "Incident Response Obligations." The words don't match, so the results don't either.
Building semantic search from scratch is no small task. You need to choose an embedding model, set up a vector database, configure a reranker, select a document parser, and design a chunking strategy. That's five components from different vendors, each with its own API, configuration, and failure modes. Most teams spend weeks wiring this infrastructure together before returning a single relevant result to a user.
The cost isn't just the initial build. Every component needs maintenance — embedding models get updated, vector databases require scaling, parsers break on edge-case formats. Teams that build customer support search or internal knowledge bases from scratch consistently underestimate this ongoing burden.
How It Works
Ragex collapses the entire retrieval pipeline into three endpoints. No vector database to manage, no embedding model to choose, no chunking logic to debug.
The workflow:
- Create a knowledge base for your document collection
- Upload documents in any of 16 supported formats — the API parses, chunks, embeds, and indexes them automatically
- Search with natural language and get semantically ranked results
Here's what the API flow looks like in practice:
POST /v1/knowledge-bases → Create a knowledge base
POST /v1/knowledge-bases/:kb_id/documents → Upload PDFs, Word docs, spreadsheets
POST /v1/knowledge-bases/:kb_id/search → Search with a natural-language query
→ Feed top results as context to your LLM or display directly
The API handles embedding and reranking automatically. You don't pick models, configure vector dimensions, or manage reranker inference. When better models become available, your search quality improves without a code change. This is the same abstraction that makes integrations with LangChain and LlamaIndex straightforward — your retrieval call stays the same regardless of what runs behind it.
What You Can Build
Document search shows up in more applications than most teams expect. Here are three scenarios where Ragex replaces weeks of custom pipeline work.
Technical documentation portals. A SaaS company with 2,000 pages of product docs, API references, and troubleshooting guides uploads everything to a knowledge base. Their search bar now returns results by meaning, not keywords. A developer typing "rate limits" finds the section titled "API Throttling Configuration" without anyone manually tagging content. Teams building chatbots on top of these docs get the same retrieval quality with zero extra work.
Legal contract review. A compliance team uploads thousands of contracts, policy documents, and regulatory filings. An analyst searching for "indemnification clauses in vendor agreements" gets every relevant passage across the entire corpus — even when the language varies from contract to contract. Tables containing payment terms and liability caps are kept intact, so reviewers see complete information rather than fragments.
Research paper discovery. A biotech company indexes internal research papers, clinical trial summaries, and submission documents. Scientists searching for specific compounds or methodologies find related work across the entire library, including results from scanned PDFs and image-based figures that were processed with OCR. Organizations in healthcare and adjacent fields get particular value from this pattern because of the variety of document formats in play.
Your Documents Work Out of the Box
Document search pipelines break when they encounter file formats their parser doesn't handle. The API supports 16 file types so you don't have to maintain separate parsing logic for each format.
| Tier | File Types | How They're Processed |
|---|---|---|
| Tier 1 (advanced parsing) | PDF, DOCX, PPTX, XLSX, images | Layout-aware extraction with OCR for scanned text, multi-column layouts, and complex tables |
| Tier 2 (direct ingestion) | TXT, MD, HTML, CSV, JSON, TSV, XML | Direct text extraction — fast and lightweight |
For document search specifically, table handling matters. Tables are never split across chunks. A pricing grid, a specifications table, or a comparison matrix stays as a single retrievable unit. When someone searches for specific data points, they get the full table with all its context — not a fragment missing its last two rows.
Maximum file size is 50MB or 500 pages. Technical manuals, legal contracts, research papers, internal wikis, and compliance filings all process without special configuration. If you're building a search interface with the Vercel AI SDK, documents are ready to query as soon as processing completes.
Works With Your Stack
Ragex for document search fits into whatever stack you're already using — it's a building block, not a walled garden.
Frameworks. Use the API as a retrieval backend with LangChain, LlamaIndex, or the Vercel AI SDK. Call the search endpoint, get ranked passages, and pass them to your framework's LLM chain or completion call.
LLMs. The API returns ranked text passages — you choose which LLM processes them. OpenAI, Anthropic, Cohere, or open-source models all work. Swap models anytime without touching your retrieval setup.
Data sources. Upload documents from anywhere — local file systems, S3 buckets, Google Drive exports, Confluence, Notion. The API processes files, not sources, so your upload mechanism is up to you. Teams building customer support search often start by exporting their help center as HTML.
Our comparison with Pinecone breaks down what you'd need to assemble yourself versus what the managed API handles. If you're exploring other options, the Pinecone alternatives page covers the landscape.
Getting Started
Install the SDK, create a knowledge base, upload a document, and run your first search. The entire flow takes under 5 minutes. You need an API key from the dashboard and a document to upload — anything from a single-page text file to a 500-page PDF works for your first test.
pip install ragex
from ragex import Ragex
client = Ragex(api_key="your-api-key")
kb = client.knowledge_bases.create(name="product-docs") # create a knowledge base
client.documents.upload( # upload a document
knowledge_base_id=kb.id,
file_path="./technical-manual.pdf"
)
results = client.search( # search with natural language
knowledge_base_id=kb.id,
query="how to configure rate limiting",
top_k=5,
rerank=True
)
for result in results:
print(result.content, result.score)
Text files process in roughly 4 seconds. A 10-page PDF finishes in under 60 seconds. Once processing completes, search is live — no manual index builds, no configuration steps, no waiting.
Pricing
Recommended for most document search applications: Pro ($79/mo). Most document search projects index thousands of files and serve up to 15,000 queries per month as adoption grows. Pro gives 2,000 pages and 15,000 queries -- room to grow without overpaying. For larger workloads, the Business plan ($229/mo) provides 6,500 pages and 50,000 queries, and Scale ($499/mo) offers 15,000 pages and 120,000 queries.
Starter ($29/mo) works for prototypes or smaller document collections under 500 pages. If you're exploring whether Ragex compares favorably to building with Pinecone, Starter is a low-commitment way to test with real documents.
Pricing is all-inclusive. Document parsing, embedding, reranking, and search are bundled — no separate bills for vector storage, embedding API calls, or reranker inference.
FAQ
How long does it take to index documents and run the first search?
Under 5 minutes from signup to first query. Create a knowledge base, upload your documents, and the API processes them automatically. Text files take roughly 4 seconds. A 10-page PDF finishes in under 60 seconds. Once processing completes, you can search immediately — no manual index builds or configuration steps required.
What file types does the API support for document search?
The API supports 16 file types across two tiers. Tier 1 files — PDF, DOCX, PPTX, XLSX, and images — are parsed with layout-aware extraction that handles scanned text, multi-column layouts, and complex tables. Tier 2 files — TXT, MD, HTML, CSV, JSON, TSV, and XML — are ingested directly. Maximum file size is 50MB or 500 pages.
How does the API handle tables and structured content inside documents?
Tables are never split across chunks. The parsing pipeline detects table boundaries and keeps each table as a single retrievable unit. This matters for document search because partial tables — a pricing grid missing its last two rows, or a spec sheet cut mid-column — return misleading results. Intact tables mean accurate retrieval.
Can I use this for real-time search in a user-facing application?
Yes. Search responses return fast enough for interactive applications — users won't notice the retrieval step. If your application prioritizes speed over ranking quality, you can disable reranking per request by setting "rerank": false. Both modes return the same result set; reranking only re-orders them by deeper semantic relevance. For high-traffic applications, the internal knowledge base pattern shows how to handle concurrent users at scale.
Last updated: February 20, 2026