RAG API with Next.js Integration Guide | Ragex

Adding document search to a Next.js application usually means assembling a vector database, embedding pipeline, document parser, and search endpoint — four separate services before you write a single React component. Integrating a RAG API with Next.js replaces this stack with one SDK and three API calls. Upload documents, create a server-side search route, and build a search UI.

Why Next.js developers need a managed retrieval API

Next.js apps commonly need document search for help centers, knowledge bases, documentation sites, and AI-powered product features. Building this from scratch means:

Provisioning a vector database (Pinecone at $70/mo+, or self-hosted pgvector)
Setting up document parsing for PDFs, DOCX, and other formats
Integrating an embedding model (OpenAI embeddings at $50-150/mo)
Building API routes to orchestrate upload, processing, and search

This infrastructure work delays the feature you actually want to build — the search interface and user experience. Teams building document search features spend weeks on pipeline code that a managed API eliminates.

Setting up the integration

Install the SDK

npm install ragex

The TypeScript SDK uses native fetch with zero external dependencies and ships with full type definitions. It works with Next.js 14+ (App Router and Pages Router).

Create a shared client

// lib/rag.ts
import { RagexClient } from 'ragex';

export const ragexClient = new RagexClient({
  apiKey: process.env.RAGEX_API_KEY!,
});

export const KB_ID = process.env.RAG_KB_ID!;

Add RAGEX_API_KEY and RAG_KB_ID to your .env.local. The API key must stay server-side — never expose it to the browser.

Build the API route

// app/api/search/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { ragexClient, KB_ID } from '@/lib/rag';

export async function POST(req: NextRequest) {
  const { query } = await req.json();

  if (!query || typeof query !== 'string') {
    return NextResponse.json({ error: 'Query is required' }, { status: 400 });
  }

  const results = await ragexClient.search(KB_ID, {
    query,
    top_k: 5,
  });

  return NextResponse.json({
    results: results.results.map(r => ({
      text: r.text,
      score: r.score,
      document: r.document_name,
    })),
  });
}

This route accepts a search query, calls Ragex with reranking enabled by default, and returns cleaned results to the frontend. For teams using the Vercel AI SDK, this same pattern works within streaming AI responses.

Build the search component

// components/Search.tsx
'use client';
import { useState } from 'react';

export function Search() {
  const [query, setQuery] = useState('');
  const [results, setResults] = useState<any[]>([]);
  const [loading, setLoading] = useState(false);

  async function handleSearch(e: React.FormEvent) {
    e.preventDefault();
    setLoading(true);
    const res = await fetch('/api/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ query }),
    });
    const data = await res.json();
    setResults(data.results);
    setLoading(false);
  }

  return (
    <div>
      <form onSubmit={handleSearch}>
        <input value={query} onChange={e => setQuery(e.target.value)} placeholder="Search docs..." />
        <button type="submit" disabled={loading}>Search</button>
      </form>
      {results.map((r, i) => (
        <div key={i}>
          <p>{r.text}</p>
          <small>{r.document} — Score: {r.score.toFixed(2)}</small>
        </div>
      ))}
    </div>
  );
}

Drop this component into any page. The search is fully functional — it queries the API route, which queries Ragex, which returns reranked results from your uploaded documents.

Populating the knowledge base

Upload documents from a setup script, admin page, or CI/CD pipeline. Most teams run an initial bulk upload and then add documents when content changes:

import { RagexClient } from 'ragex';
import fs from 'fs';
import path from 'path';

const client = new RagexClient({ apiKey: process.env.RAGEX_API_KEY! });
const KB_ID = process.env.RAG_KB_ID!;

// Upload a single document
const buffer = fs.readFileSync('./docs/product-guide.pdf');
const file = new File([buffer], 'product-guide.pdf', { type: 'application/pdf' });
await client.uploadDocument(KB_ID, file);

// Bulk upload a directory of docs
const docsDir = './docs';
for (const filename of fs.readdirSync(docsDir)) {
  const filePath = path.join(docsDir, filename);
  const buf = fs.readFileSync(filePath);
  const doc = new File([buf], filename);
  await client.uploadDocument(KB_ID, doc);
}

The API supports 16 file types — PDFs, DOCX, spreadsheets, images with OCR, markdown, and more. Documents process asynchronously and become searchable once they reach ready status. You can poll the document status or set up a webhook to know when processing completes.

For teams that update docs frequently — help centers, product changelogs, internal wikis — re-uploading a document replaces the old version. The API re-parses, re-chunks, and re-embeds automatically. No manual re-indexing.

Adding AI-generated answers

Document search returns relevant text chunks, but users often want a direct answer rather than a list of excerpts. Combine retrieval with an LLM to build a Q&A endpoint that synthesizes answers from your documents:

// app/api/ask/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { ragexClient, KB_ID } from '@/lib/rag';
import OpenAI from 'openai';

const openai = new OpenAI();

export async function POST(req: NextRequest) {
  const { question } = await req.json();

  const results = await ragexClient.search(KB_ID, { query: question, top_k: 5 });
  const context = results.results.map(r => r.text).join('\n\n');

  const completion = await openai.chat.completions.create({
    model: 'gpt-5-mini',
    messages: [
      { role: 'system', content: `Answer using only this context. If the context doesn't cover the question, say so.\n\n${context}` },
      { role: 'user', content: question },
    ],
  });

  return NextResponse.json({
    answer: completion.choices[0].message.content,
    sources: results.results.map(r => r.document_name),
  });
}

This pattern works with any LLM provider — OpenAI, Anthropic, or open-source models. Ragex handles retrieval; you choose the generation model. Including source document names in the response lets your frontend display citations, which builds user trust. Teams building AI assistants commonly extend this pattern with conversation history and streaming responses.

Caching search results

For documentation sites and help centers where the knowledge base does not change hourly, caching search results reduces RAG API calls and speeds up repeat queries. Next.js provides several caching layers you can use:

// app/api/search/route.ts — with response caching
import { NextRequest, NextResponse } from 'next/server';
import { ragexClient, KB_ID } from '@/lib/rag';

export async function POST(req: NextRequest) {
  const { query } = await req.json();

  const results = await ragexClient.search(KB_ID, {
    query,
    top_k: 5,
  });

  const response = NextResponse.json({ results: results.results });
  response.headers.set('Cache-Control', 's-maxage=3600, stale-while-revalidate=600');
  return response;
}

For Server Components, use Next.js unstable_cache to cache search results at the request level. This works well for pages that render search results at build time or ISR — the same query returns cached results for the configured revalidation period. Invalidate the cache when you upload new documents to keep results fresh.

Protecting the search endpoint

In production, your search API route needs authentication and rate limiting to prevent abuse. Next.js middleware handles this cleanly:

// middleware.ts
import { NextRequest, NextResponse } from 'next/server';

export function middleware(req: NextRequest) {
  if (req.nextUrl.pathname.startsWith('/api/search')) {
    const authHeader = req.headers.get('authorization');
    if (!authHeader || !authHeader.startsWith('Bearer ')) {
      return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
    }
  }
  return NextResponse.next();
}

For public-facing search (help centers, documentation sites), consider adding rate limiting per IP address instead of requiring authentication. This prevents bots from exhausting your RAG API quota while keeping the search accessible to real users.

Teams building customer support search typically use session-based auth, while internal knowledge base apps use corporate SSO tokens passed through the middleware layer.

Deployment considerations

Ragex works in all Next.js deployment environments (Vercel, self-hosted, Docker). The SDK uses standard HTTP calls that work in Node.js serverless functions, edge functions, and long-running servers.

For Vercel deployments specifically, use the Node.js runtime for search routes (not Edge) since the SDK relies on Node.js APIs. Set environment variables in the Vercel dashboard under Project Settings > Environment Variables. If you use the Vercel AI SDK for streaming, the RAG search call works as a pre-processing step before the stream begins. See also Pinecone alternatives for a broader comparison of managed retrieval options.

Plans start at $29/mo (Starter), with Pro at $79/mo and Scale at $199/mo. For teams evaluating the full stack, see how this compares to building with LangChain or LlamaIndex.

If you are weighing managed RAG against a DIY vector database, the Pinecone comparison breaks down the cost and complexity differences.

FAQ

Does the SDK work with Next.js middleware?

The Ragex client should not run in Edge middleware because it requires Node.js APIs. Use standard Node.js API routes for search endpoints. Edge middleware can handle authentication or rate limiting before the request reaches your search route. This is the same separation of concerns used by teams building production applications — middleware for request filtering, route handlers for data fetching.

Can I use Server Components instead of an API route?

Yes. Call ragexClient.search() directly in a Server Component to render search results on the server. This avoids a client-side fetch but means search happens on page load, which is ideal for SEO-visible content like FAQ pages and documentation. For interactive search with real-time typing, use an API route with a client-side component. You can also combine both approaches — server-render initial results, then enhance with client-side search for subsequent queries.

How do I handle large knowledge bases with many documents?

Upload as many documents as your plan allows — the API handles indexing and search internally. Search performance does not degrade with more documents because the managed infrastructure scales automatically. Use metadata filtering to scope searches to relevant subsets (by category, version, or department). This is the same approach teams use when building multi-tenant chatbots that serve different customers from separate document collections.

Can I use this with the Pages Router instead of App Router?

Yes. Create the API route at pages/api/search.ts using the standard Pages Router handler format (export default function handler(req, res)). The SDK calls are identical — only the route handler syntax changes. If you are migrating from Pages Router to App Router, the search logic moves without modification. The Ragex client initialization, search calls, and response formatting remain the same in both router versions.

Last updated: 2026-03-09