Skip to content
Homeβ€Ί JavaScriptβ€Ί How to Build an AI Agent with Next.js, LangChain & Supabase

How to Build an AI Agent with Next.js, LangChain & Supabase

Where developers are forged. Β· Structured learning Β· Free forever.
πŸ“ Part of: React.js β†’ Topic 28 of 32
Step-by-step tutorial to build a production AI agent using Next.
πŸ”₯ Advanced β€” solid JavaScript foundation required
In this tutorial, you'll learn
Step-by-step tutorial to build a production AI agent using Next.
  • An agent decides tool sequences dynamically β€” unlike chains which follow fixed steps
  • maxSteps on the agent prevents infinite loops β€” always set it to 10 in production
  • Supabase stores conversation history and vector embeddings β€” the database is the agent's memory
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚑Quick Answer
  • An AI Agent uses an LLM to decide which tools to call and in what order β€” unlike a chain, it reasons about actions
  • LangChain provides the agent framework β€” createToolCallingAgent + RunnableWithMessageHistory runs the ReAct loop (reason, act, observe)
  • Supabase stores conversation history and vector embeddings for long-term memory
  • Next.js 16 Server Actions and Route Handlers stream responses via Vercel AI SDK (streamText)
  • Token costs scale with conversation length β€” truncate or summarize history to control spend
  • Biggest mistake: no tool guardrails β€” agents can call tools infinitely without maxSteps or deduplication
🚨 START HERE
AI Agent Quick Debug Reference
Fast commands for diagnosing agent, tool, and memory issues
🟑Agent not calling tools
Immediate ActionCheck agent prompt and tool descriptions
Commands
grep -rn 'tool\|Tool\|description' lib/agent/ --include='*.ts' | head -20
cat lib/agent/tools.ts | head -60
Fix NowVerify tools are passed to createToolCallingAgent and each tool has a clear description string
🟑Supabase vector search returning empty
Immediate ActionCheck if embeddings exist in the database
Commands
curl -s "${SUPABASE_URL}/rest/v1/documents?select=count" -H "apikey: ${SUPABASE_ANON_KEY}" | jq
curl -s "${SUPABASE_URL}/rest/v1/rpc/match_documents" -X POST -H "apikey: ${SUPABASE_ANON_KEY}" -H "Content-Type: application/json" -d '{"query_embedding":[0.1,0.2],"match_count":5}' | jq
Fix NowVerify the match_documents RPC function exists and embeddings were generated with the same model
🟑Streaming not working
Immediate ActionCheck the Route Handler returns a streaming response
Commands
grep -rn 'streamText\|toDataStreamResponse' app/api/ --include='*.ts' | head -10
curl -s -N http://localhost:3000/api/chat -X POST -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"hello"}]}' | head -5
Fix NowUse streamText from ai package and ensure the client uses useChat hook
🟑Token costs spiking
Immediate ActionCheck conversation history length and token usage
Commands
grep -rn 'tokenCount\|token_count\|usage' lib/agent/ --include='*.ts' | head -10
psql "${DATABASE_URL}" -c "SELECT conversation_id, SUM(token_count) as total FROM messages GROUP BY conversation_id ORDER BY total DESC LIMIT 10"
Fix NowImplement conversation summarization when history exceeds 8,000 tokens
Production IncidentAgent Infinite Loop Cost $4,200 in OpenAI Credits in 3 HoursA customer support agent entered an infinite tool-calling loop. It kept searching the knowledge base, receiving the same result, and searching again with slightly rephrased queries. The loop ran for 3 hours, consuming 12 million tokens and $4,200 in OpenAI credits.
SymptomThe billing alert fired at 2 AM β€” OpenAI API spend had exceeded the daily budget by 84x. The agent logs showed 14,000 tool calls for a single conversation. The agent was stuck in a loop: search_knowledge_base -> result -> "I need more information" -> search_knowledge_base -> same result -> repeat.
AssumptionThe AgentExecutor default maxIterations (15) would prevent infinite loops. The team assumed the agent would stop after 15 steps and return a partial answer.
Root causeThe team had set maxIterations to 0 (unlimited) during development to debug a complex multi-step workflow and never changed it back. The agent's prompt did not include a fallback instruction for when tools return no new information. The knowledge base search returned the same 3 results every time, but the agent kept rephrasing the query because the prompt said "search until you find the answer." Without a max iteration limit or a "stop if no new information" instruction, the loop continued indefinitely.
FixSet maxSteps to 10 for production. Added a tool result deduplication check β€” if the same search result is returned twice, the agent is instructed to synthesize an answer from available information instead of searching again. Added a hard cost ceiling via OpenAI usage limits ($100/day). Added per-conversation token tracking in Supabase β€” conversations exceeding 8,000 tokens trigger automatic summarization. Added an alert for any conversation exceeding 20 tool calls.
Key Lesson
Always set maxSteps on the agent β€” unlimited loops will drain your API budget.Include fallback instructions in the agent prompt: what to do when tools return no new information.Set OpenAI usage limits as a safety net β€” they are the last line of defense against cost overruns.Track per-conversation token usage β€” summarize or truncate history when it exceeds your budget threshold.
Production Debug GuideDiagnose agent loops, tool failures, and memory issues
Agent enters infinite tool-calling loop→Check maxSteps on the agent — set to 10 for production. Add deduplication for tool results.
Agent hallucinates tool input parameters→Add Zod schema validation to tool input — reject invalid inputs before the tool executes
Conversation history not persisting across requests→Verify Supabase client is writing to the conversations table on each message — check RLS policies
Streaming response stops mid-token→Check that the Route Handler returns streamText().toDataStreamResponse() and the client uses useChat from Vercel AI SDK
Agent ignores tools and answers directly→Check the agent prompt — ensure tool descriptions are clear and the prompt instructs the agent to use tools
Vector search returns irrelevant results→Check embedding model consistency — the same model must be used for indexing and querying. Verify the match_threshold parameter (0.78-0.82 for text-embedding-3-small).
Token costs higher than expected→Check conversation history length — long histories multiply token usage. Implement summarization for conversations over 8,000 tokens.

Building an AI agent requires orchestrating four components: the LLM (reasoning engine), tools (external capabilities), memory (conversation history), and a runtime loop (agent executor). LangChain provides the agent framework with ReAct prompting β€” the model reasons about what to do, selects a tool, observes the result, and decides the next step.

Supabase serves two roles: PostgreSQL stores conversation history for session continuity, and pgvector stores embeddings for semantic search over past interactions. Next.js 16 provides the API layer β€” Server Actions and Route Handlers process requests, and the Vercel AI SDK streams token-by-token responses to the client.

The production challenges are cost control (token usage scales with history length), latency (multi-step agent loops add sequential delay), reliability (tools can fail, LLMs can hallucinate tool inputs), and observability (debugging why an agent chose a specific action). This guide covers the complete implementation with production patterns for each.

Architecture: Agent, Tools, Memory, and Runtime

An AI agent has four components that work together in a loop. The LLM is the reasoning engine β€” it reads the conversation history and tool descriptions, decides which tool to call, and interprets the result. Tools are external functions the agent can invoke execution, API calls. Memory stores conversation history so the agent maintains context across turns. The runtime (AgentExecutor) orchestrates the loop: send context to the LLM, parse the tool call, execute the tool, append the result, repeat.

The key difference between an agent and a chain: a chain follows a fixed sequence of steps (retrieve -> format -> generate). An agent decides the sequence dynamically β€” it might search, then calculate, then search again, then answer. This flexibility comes at a cost: each step is a separate LLM call, adding latency and token usage.

LangChain's ReAct agent type implements this pattern. The prompt instructs the LLM to Think (reason about what to do), Act (call a tool with specific inputs), and Observe (read the tool result). The loop continues until the agent decides it has enough information to answer, or maxSteps is reached.

io.thecodeforge.ai-agent.architecture.ts Β· TYPESCRIPT
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
// ============================================
// AI Agent Architecture β€” Core Components (2026 LangChain 0.3+)
// ============================================

// ---- Component 1: LLM (Reasoning Engine) ----
// The LLM decides which tools to call and interprets results

import { ChatOpenAI } from '@langchain/openai'

const llm = new ChatOpenAI({
  modelName: 'gpt-4o',
  temperature: 0, // Deterministic for tool-calling β€” reduces hallucination
  openAIApiKey: process.env.OPENAI_API_KEY,
})

// ---- Component 2: Tools (External Capabilities) ----
// Each tool has a name, description, schema, and execution function

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

const searchKnowledgeBase = new DynamicStructuredTool({
  name: 'search_knowledge_base',
  description: 'Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.',
  schema: z.object({
    query: z.string().describe('The search query β€” use keywords from the user question'),
    category: z.enum(['product', 'pricing', 'troubleshooting', 'general']).optional()
      .describe('Filter by category if the topic is clear'),
  }),
  func: async ({ query, category }) => {
    // Search Supabase vector store
    const results = await searchDocuments(query, category)
    if (results.length === 0) {
      return 'No results found. Try a different search query or answer from general knowledge.'
    }
    return results.map((r) => `[${r.title}]: ${r.content}`).join('\n\n')
  },
})

const getOrderStatus = new DynamicStructuredTool({
  name: 'get_order_status',
  description: 'Look up the status of a customer order by order ID. Returns shipping status, estimated delivery, and tracking number.',
  schema: z.object({
    orderId: z.string().describe('The order ID β€” format: ORD-XXXXX'),
  }),
  func: async ({ orderId }) => {
    const order = await getOrderFromDatabase(orderId)
    if (!order) {
      return `Order ${orderId} not found. Ask the user to verify the order ID.`
    }
    return JSON.stringify({
      status: order.status,
      estimatedDelivery: order.estimatedDelivery,
      trackingNumber: order.trackingNumber,
    })
  },
})

const calculateRefund = new DynamicStructuredTool({
  name: 'calculate_refund',
  description: 'Calculate the refund amount for a return. Considers order total, return reason, and days since purchase.',
  schema: z.object({
    orderId: z.string().describe('The order ID'),
    returnReason: z.enum(['defective', 'wrong_item', 'changed_mind', 'not_as_described'])
      .describe('The reason for the return'),
  }),
  func: async ({ orderId, returnReason }) => {
    const order = await getOrderFromDatabase(orderId)
    if (!order) return 'Order not found.'

    const daysSincePurchase = Math.floor(
      (Date.now() - new Date(order.createdAt).getTime()) / (1000 * 60 * 60 * 24)
    )

    let refundPercentage = 1.0
    if (returnReason === 'changed_mind' && daysSincePurchase > 30) {
      refundPercentage = 0.0
    } else if (returnReason === 'changed_mind') {
      refundPercentage = 0.85
    }

    const refundAmount = order.total * refundPercentage
    return JSON.stringify({ refundAmount, refundPercentage, daysSincePurchase })
  },
})

// ---- Component 3: Memory (Conversation History) ----
// Stored in Supabase PostgreSQL for persistence across requests
// Custom history class to match your token-tracking schema
import { SupabaseChatMessageHistory } from '@langchain/community/stores/message/supabase'

function createMemory(conversationId: string) {
  return new SupabaseChatMessageHistory({
    supabaseClient: supabase,
    tableName: 'messages',
    sessionId: conversationId,
  })
}

// ---- Component 4: Agent (2026 pattern) ----
// Use createToolCallingAgent + RunnableWithMessageHistory
import { createToolCallingAgent } from 'langchain/agents'
import { RunnableWithMessageHistory } from '@langchain/core/runnables'
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts'

const prompt = ChatPromptTemplate.fromMessages([
  ['system', `You are a helpful customer support agent for Acme Corp.

You have access to the following tools:
- search_knowledge_base: Search the internal knowledge base
- get_order_status: Look up order status by order ID
- calculate_refund: Calculate refund amounts for returns

Rules:
1. Always search the knowledge base before answering product questions
2. If a tool returns no new information, answer from what you know
3. Never make up order statuses β€” always use get_order_status
4. If you cannot answer, say so and offer to connect with a human agent
5. Maximum 10 tool calls per response β€” synthesize your answer after that`],
  new MessagesPlaceholder('chat_history'),
  ['human', '{input}'],
  new MessagesPlaceholder('agent_scratchpad'),
])

async function createAgent(conversationId: string) {
  const tools = [searchKnowledgeBase, getOrderStatus, calculateRefund]

  const agent = await createToolCallingAgent({
    llm,
    tools,
    prompt,
  })

  const agentWithHistory = new RunnableWithMessageHistory({
    runnable: agent,
    getMessageHistory: () => createMemory(conversationId),
    inputMessagesKey: 'input',
    historyMessagesKey: 'chat_history',
  })

  return agentWithHistory
}
Mental Model
The Agent Loop: Think, Act, Observe
An agent repeats a three-step loop: reason about what to do, call a tool, observe the result β€” until it has enough information to answer.
  • Think: the LLM reads the conversation history and tool descriptions, decides which tool to call
  • Act: the agent calls the tool with specific inputs parsed from the LLM output
  • Observe: the tool result is appended to the conversation, and the LLM decides the next step
  • Each step is a separate LLM call β€” latency and cost scale with the number of iterations
  • maxSteps stops the loop β€” without it, the agent can run indefinitely and drain API credits
πŸ“Š Production Insight
Each agent step is a separate LLM call β€” 10 iterations means 10 API calls per user message.
maxSteps is the safety net β€” without it, agents loop infinitely and drain API credits.
Rule: set maxSteps to 10 for production, add cost monitoring per conversation.
🎯 Key Takeaway
An agent decides tool sequences dynamically β€” unlike chains which follow fixed steps.
createToolCallingAgent + RunnableWithMessageHistory orchestrates the loop β€” maxSteps prevents infinite loops and cost overruns.
Each iteration is a separate LLM call β€” cost and latency scale with step count.

Supabase: Memory, Vector Search, and Conversation Storage

Supabase serves three roles in the agent architecture. PostgreSQL stores conversation history β€” every message (user and assistant) is persisted for session continuity. pgvector stores document embeddings for semantic search β€” the agent can retrieve relevant knowledge base entries. The Supabase client provides real-time subscriptions if you want to show agent activity to the user in real time.

The conversation storage pattern: each conversation has a unique ID. Messages are appended to a messages table with the conversation_id as a foreign key. When the agent processes a new message, it loads the conversation history from Supabase and prepends it to the LLM context. This gives the agent memory across turns without storing state in the application server.

Vector search uses pgvector's cosine similarity. Documents are chunked, embedded with OpenAI's text-embedding-3-small model, and stored with their embedding vectors. The match_documents RPC function performs nearest-neighbor search and returns the top-k results above a similarity threshold (use 0.78–0.82 for text-embedding-3-small in production).

io.thecodeforge.ai-agent.supabase-schema.sql Β· SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143
-- ============================================
-- Supabase Schema for AI Agent
-- ============================================

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- ---- Conversations Table ----
-- Each conversation has a unique ID and tracks the user
CREATE TABLE public.conversations (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  title TEXT,
  total_tokens INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.conversations ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can manage their own conversations"
  ON public.conversations
  FOR ALL
  USING (auth.uid() = user_id)
  WITH CHECK (auth.uid() = user_id);

-- ---- Messages Table ----
-- Stores each message in a conversation
CREATE TABLE public.messages (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system', 'tool')),
  content TEXT NOT NULL,
  tool_calls JSONB, -- Stores tool call details for assistant messages
  tool_call_id TEXT, -- Links tool response to the original call
  token_count INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can manage messages in their conversations"
  ON public.messages
  FOR ALL
  USING (
    conversation_id IN (
      SELECT id FROM public.conversations WHERE user_id = auth.uid()
    )
  )
  WITH CHECK (
    conversation_id IN (
      SELECT id FROM public.conversations WHERE user_id = auth.uid()
    )
  );

-- Index for fast conversation history retrieval
CREATE INDEX idx_messages_conversation_created
  ON public.messages (conversation_id, created_at ASC);

-- ---- Documents Table ----
-- Knowledge base with vector embeddings
CREATE TABLE public.documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  category TEXT, -- product, pricing, troubleshooting, general
  embedding vector(1536), -- text-embedding-3-small produces 1536-dim vectors
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Documents are readable by authenticated users"
  ON public.documents
  FOR SELECT
  USING (auth.role() = 'authenticated');

-- Index for vector similarity search
CREATE INDEX ON public.documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- ---- Vector Search RPC Function ----
-- Performs nearest-neighbor search using cosine similarity
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_count INTEGER DEFAULT 5,
  match_threshold FLOAT DEFAULT 0.8, -- Production value for text-embedding-3-small
  filter_category TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  content TEXT,
  category TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.title,
    documents.content,
    documents.category,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM public.documents
  WHERE
    (filter_category IS NULL OR documents.category = filter_category)
    AND 1 - (documents.embedding <=> query_embedding) > match_threshold
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

-- ---- Conversation Summaries Table ----
-- Stores summarized history for long conversations
CREATE TABLE public.conversation_summaries (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE,
  summary TEXT NOT NULL,
  message_count INTEGER NOT NULL, -- Number of messages summarized
  token_count INTEGER NOT NULL, -- Token count of the original messages
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- ---- Updated At Trigger ----
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$;

CREATE TRIGGER conversations_updated_at
  BEFORE UPDATE ON public.conversations
  FOR EACH ROW
  EXECUTE FUNCTION update_updated_at();
Mental Model
Supabase as the Agent's Memory Layer
Supabase provides three memory capabilities: conversation storage, semantic search, and session persistence across requests.
  • PostgreSQL stores conversation history β€” every message persisted for session continuity
  • pgvector stores document embeddings β€” agent retrieves relevant knowledge via semantic search
  • Conversation summaries compress long histories β€” control token costs for extended sessions
  • RLS policies ensure users only access their own conversations β€” security at the database level
  • The match_documents RPC function performs nearest-neighbor search with configurable threshold (0.78-0.82 recommended)
πŸ“Š Production Insight
pgvector cosine similarity search retrieves relevant documents β€” threshold filters low-quality matches (use 0.78-0.82 for text-embedding-3-small).
Conversation history grows linearly with turns β€” summarize after 8,000 tokens to control costs.
Rule: store every message, track token counts per conversation, summarize when history is too long.
🎯 Key Takeaway
Supabase serves three roles: conversation storage, vector search, and session persistence.
pgvector match_documents RPC performs semantic search β€” cosine similarity with threshold filtering.
Summarize long conversations to control token costs β€” store summaries in a separate table.

Next.js Integration: Streaming API and Server Actions

The Next.js integration has two concerns: the API layer that processes agent requests, and the streaming layer that delivers token-by-token responses to the client. The Vercel AI SDK (ai package) handles both β€” it provides useChat for the client and streamText/StreamingTextResponse for the server.

The API route receives the user's message, loads conversation history from Supabase, runs the agent, and streams the response. The client uses useChat to manage the message list and display streaming tokens. The key pattern: the Route Handler returns streamText(...).toDataStreamResponse(), which is a streaming response that the client consumes incrementally.

For non-streaming use cases (background processing, webhook-triggered agents), use Server Actions instead. They run the agent synchronously and return the result. Server Actions are simpler but do not support streaming β€” the user waits for the full response.

io.thecodeforge.ai-agent.client-component.tsx Β· TSX
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
// ============================================
// Client Component β€” Chat Interface with Streaming
// ============================================
// File: components/chat-interface.tsx

'use client'

import { useChat } from 'ai/react'
import { useState } from 'react'
import { Button } from '@/components/ui/button'
import { Input } from '@/components/ui/input'

export function ChatInterface({ conversationId }: { conversationId?: string }) {
  const [activeConversationId, setActiveConversationId] = useState(conversationId)

  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
    body: { conversationId: activeConversationId },
    onFinish: (message) => {
      // Extract conversation ID from response headers
      // (set by the server in StreamingTextResponse headers)
    },
    onError: (error) => {
      console.error('Chat error:', error)
    },
  })

  return (
    <div className="flex flex-col h-[600px] max-w-2xl mx-auto">
      {/* Messages */}
      <div className="flex-1 overflow-y-auto space-y-4 p-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div
              className={`max-w-[80%] rounded-lg px-4 py-2 ${
                message.role === 'user'
                  ? 'bg-primary text-primary-foreground'
                  : 'bg-muted'
              }`}
            >
              <p className="text-sm whitespace-pre-wrap">{message.content}</p>
            </div>
          </div>
        ))}

        {isLoading && (
          <div className="flex justify-start">
            <div className="bg-muted rounded-lg px-4 py-2">
              <div className="flex space-x-1">
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" />
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-100" />
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-200" />
              </div>
            </div>
          </div>
        )}
      </div>

      {/* Input */}
      <form onSubmit={handleSubmit} className="flex gap-2 p-4 border-t">
        <Input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask a question..."
          disabled={isLoading}
          className="flex-1"
        />
        <Button type="submit" disabled={isLoading}>
          Send
        </Button>
      </form>
    </div>
  )
}

// ============================================
// Server Action β€” Non-Streaming Agent (Background Processing)
// ============================================
// File: app/actions/agent.ts

'use server'

import { createAgent } from '@/lib/agent/factory'
import { createClient } from '@/lib/supabase/server'

export async function processAgentMessage(
  conversationId: string,
  message: string
) {
  const supabase = await createClient()
  const { data: { user } } = await supabase.auth.getUser()

  if (!user) {
    throw new Error('Unauthorized')
  }

  // Store user message
  await supabase.from('messages').insert({
    conversation_id: conversationId,
    role: 'user',
    content: message,
  })

  // Run agent (non-streaming)
  const agent = await createAgent(conversationId)
  const result = await agent.invoke({ input: message })

  // Store assistant response
  await supabase.from('messages').insert({
    conversation_id: conversationId,
    role: 'assistant',
    content: result.output,
  })

  return {
    answer: result.output,
    steps: result.intermediateSteps?.length ?? 0,
  }
}
⚠ Streaming vs Server Actions: When to Use Each
πŸ“Š Production Insight
Route Handlers with streamText().toDataStreamResponse() enable real-time token streaming to the client.
Server Actions are simpler but cannot stream β€” use them for background processing only.
Rule: use Route Handlers for user-facing chat, Server Actions for background agent tasks.
🎯 Key Takeaway
Streaming requires Route Handler + streamText() + useChat on the client.
Server Actions run agents synchronously β€” no streaming, simpler but slower perceived performance.
Store every message in Supabase β€” conversation history enables session continuity.

Tools: Building and Validating Agent Capabilities

Tools are the agent's external capabilities β€” each tool is a function with a name, description, and input schema. The LLM reads the tool descriptions to decide which tool to call, and parses the input schema to generate valid arguments. The quality of the description directly affects the agent's ability to use the tool correctly.

The critical pattern: validate tool inputs with Zod before execution. LLMs can hallucinate invalid inputs β€” missing required fields, wrong types, or malformed queries. Zod validation catches these before the tool executes, returning a clear error message that the agent can learn from.

Tool design principles: descriptions should be specific (not just "search the database"), input schemas should use .describe() on every field (the LLM reads these descriptions), and error messages should be actionable (tell the agent what went wrong and how to fix it).

io.thecodeforge.ai-agent.tools.ts Β· TYPESCRIPT
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
// ============================================
// Tool Design Patterns
// ============================================

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

// ---- Good Tool: Clear description, validated inputs, actionable errors + deduplication ----

export const searchKnowledgeBase = new DynamicStructuredTool({
  name: 'search_knowledge_base',
  description: `Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.
Do NOT use this tool for:
- Order status queries (use get_order_status instead)
- Calculations (use calculate tool instead)
- Questions about the current date or time`,
  schema: z.object({
    query: z.string()
      .min(3, 'Query must be at least 3 characters')
      .max(200, 'Query must be under 200 characters')
      .describe('The search query β€” use specific keywords from the user question. Avoid full sentences.'),
    category: z.enum(['product', 'pricing', 'troubleshooting', 'general'])
      .optional()
      .describe('Filter by category if the topic is clear from the question'),
    maxResults: z.number()
      .min(1)
      .max(10)
      .default(5)
      .describe('Maximum number of results to return β€” use 3 for focused answers, 5 for comprehensive'),
  }),
  func: async ({ query, category, maxResults }) => {
    try {
      // Deduplication guardrail
      const resultHash = JSON.stringify({ query, category }).slice(0, 100)
      if (globalThis.seenResults?.has(resultHash)) {
        return "You've already seen this result. Answer from context or ask for clarification."
      }
      globalThis.seenResults = globalThis.seenResults || new Set()
      globalThis.seenResults.add(resultHash)

      // Generate embedding for the query
      const embedding = await generateEmbedding(query)

      // Search Supabase vector store
      const { data, error } = await supabase.rpc('match_documents', {
        query_embedding: embedding,
        match_count: maxResults,
        match_threshold: 0.8, // Production value
        filter_category: category ?? null,
      })

      if (error) {
        return `Search failed: ${error.message}. Try a simpler query or remove the category filter.`
      }

      if (!data || data.length === 0) {
        return `No results found for "${query}". Try:
1. Using different keywords
2. Removing the category filter
3. Broadening the search terms`
      }

      return data.map((doc: any, i: number) =>
        `[Result ${i + 1}] (similarity: ${doc.similarity.toFixed(2)})
Title: ${doc.title}
Content: ${doc.content.slice(0, 500)}...`
      ).join('\n\n')
    } catch (err) {
      return `Search error: ${err instanceof Error ? err.message : 'Unknown error'}. The user should try again or contact support.`
    }
  },
})

// ---- Good Tool: Input validation with meaningful error messages ----

export const createTicket = new DynamicStructuredTool({
  name: 'create_support_ticket',
  description: 'Create a support ticket when the agent cannot resolve the issue. Use this as a last resort after exhausting available tools.',
  schema: z.object({
    subject: z.string()
      .min(10, 'Subject must be at least 10 characters β€” provide a clear summary')
      .max(200)
      .describe('Brief subject line summarizing the issue'),
    description: z.string()
      .min(50, 'Description must be at least 50 characters β€” include what was tried and what failed')
      .max(2000)
      .describe('Detailed description including what the user tried and what went wrong'),
    priority: z.enum(['low', 'medium', 'high', 'urgent'])
      .describe('Priority based on impact: low = question, medium = minor issue, high = blocked, urgent = production down'),
    category: z.enum(['billing', 'technical', 'account', 'feature_request'])
      .describe('Category of the issue'),
  }),
  func: async ({ subject, description, priority, category }) => {
    // Validate business rules
    if (priority === 'urgent' && category !== 'technical') {
      return 'Urgent priority is only available for technical issues. Please use high priority instead.'
    }

    const ticket = await createTicketInDatabase({ subject, description, priority, category })

    return `Ticket created successfully.
Ticket ID: ${ticket.id}
Priority: ${priority}
Expected response time: ${getResponseTime(priority)}

Tell the user: "I've created a support ticket (ID: ${ticket.id}) for you. A team member will respond within ${getResponseTime(priority)}."`
  },
})

// ---- Helper: Generate embedding ----
async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  })
  return response.data[0].embedding
}

function getResponseTime(priority: string): string {
  const times: Record<string, string> = {
    low: '48 hours',
    medium: '24 hours',
    high: '4 hours',
    urgent: '1 hour',
  }
  return times[priority] ?? '24 hours'
}
πŸ’‘Tool Design Best Practices
  • Descriptions must explain WHEN to use the tool β€” not just what it does
  • Every input field needs .describe() β€” the LLM reads these to generate valid arguments
  • Validate inputs with Zod before execution β€” LLMs can hallucinate invalid inputs
  • Error messages must be actionable β€” tell the agent what went wrong and how to retry
  • Return formatted text, not raw JSON β€” LLMs parse natural language better than JSON
πŸ“Š Production Insight
Tool descriptions are the LLM's instructions β€” vague descriptions cause wrong tool selection.
Zod validation catches hallucinated inputs before execution β€” return clear error messages.
Rule: describe when to use the tool, validate every input, return actionable error messages.
🎯 Key Takeaway
Tool quality determines agent quality β€” descriptions guide the LLM's tool selection.
Validate inputs with Zod β€” LLMs hallucinate invalid arguments, especially in multi-step workflows.
Error messages must guide the agent β€” tell it what failed and how to retry with different inputs.

Cost Control: Token Management and Conversation Summarization

Token costs are the primary production concern for AI agents. Each agent iteration is a separate LLM call that includes the full conversation history, tool descriptions, and the agent's reasoning. A 20-turn conversation with 10 tool calls per turn can consume 100,000+ tokens β€” costing $1-10 depending on the model.

Three strategies control costs: history truncation (keep only the last N messages), conversation summarization (replace old messages with a summary), and model selection (use gpt-4o-mini for simple tasks, gpt-4o for complex reasoning). Summarization is the most effective β€” it preserves context while reducing token count by 80-90%.

io.thecodeforge.ai-agent.cost-control.ts Β· TYPESCRIPT
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
// ============================================
// Cost Control: Token Management
// ============================================

import { ChatOpenAI } from '@langchain/openai'
import { createClient } from '@/lib/supabase/server'

// ---- Strategy 1: Conversation Summarization ----
// Summarize old messages when the conversation gets too long
// Reduces token count by 80-90% while preserving context

export async function getConversationHistory(conversationId: string, maxTokens: number = 8000) {
  const supabase = await createClient()

  // Get the latest summary (if any)
  const { data: summary } = await supabase
    .from('conversation_summaries')
    .select('summary, message_count')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: false })
    .limit(1)
    .single()

  // Get messages after the summary
  const offset = summary?.message_count ?? 0
  const { data: recentMessages } = await supabase
    .from('messages')
    .select('role, content, token_count')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: true })
    .range(offset, offset + 49)

  if (!recentMessages) return []

  // Build the history
  const history: { role: string; content: string }[] = []

  // Add summary as a system message if it exists
  if (summary) {
    history.push({
      role: 'system',
      content: `[Previous conversation summary]: ${summary.summary}`,
    })
  }

  // Add recent messages
  let totalTokens = summary ? estimateTokens(summary.summary) : 0
  for (const msg of recentMessages) {
    if (totalTokens + (msg.token_count ?? 0) > maxTokens) {
      break // Stop adding messages when we hit the token limit
    }
    history.push({ role: msg.role, content: msg.content })
    totalTokens += msg.token_count ?? 0
  }

  return history
}

// ---- Strategy 2: Model Selection Based on Task Complexity ----
// Use cheaper models for simple tasks, expensive models for complex reasoning

export function selectModel(taskComplexity: 'simple' | 'moderate' | 'complex') {
  const models = {
    simple: new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
      maxTokens: 500, // Limit output tokens for simple responses
    }),
    moderate: new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
      maxTokens: 1000,
    }),
    complex: new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0,
      maxTokens: 2000,
    }),
  }
  return models[taskComplexity]
}

// ---- Strategy 3: Token Budget Per Conversation ----
// Track and enforce token limits per conversation

export async function checkTokenBudget(conversationId: string): Promise<boolean> {
  const supabase = await createClient()
  const MAX_TOKENS_PER_CONVERSATION = 50000

  const { data: conversation } = await supabase
    .from('conversations')
    .select('total_tokens')
    .eq('id', conversationId)
    .single()

  if (!conversation) return true

  if (conversation.total_tokens >= MAX_TOKENS_PER_CONVERSATION) {
    // Summarize and continue
    await summarizeConversation(conversationId)
    return true
  }

  return true
}

// ---- Cost Estimation ----
// Estimate cost before running the agent

const MODEL_COSTS: Record<string, { input: number; output: number }> = {
  'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 },
  'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 },
  'text-embedding-3-small': { input: 0.02 / 1_000_000, output: 0 },
}

export function estimateCost(
  modelName: string,
  inputTokens: number,
  outputTokens: number
): number {
  const costs = MODEL_COSTS[modelName] ?? MODEL_COSTS['gpt-4o']
  return (inputTokens * costs.input) + (outputTokens * costs.output)
}

// ---- Usage Tracking ----
// Store token usage per conversation for billing and monitoring

export async function trackUsage(
  conversationId: string,
  modelName: string,
  inputTokens: number,
  outputTokens: number
) {
  const supabase = await createClient()
  const cost = estimateCost(modelName, inputTokens, outputTokens)

  await supabase.from('usage_logs').insert({
    conversation_id: conversationId,
    model: modelName,
    input_tokens: inputTokens,
    output_tokens: outputTokens,
    cost,
  })

  // Update conversation total
  await supabase
    .from('conversations')
    .update({
      total_tokens: supabase.rpc('increment_conversation_tokens', {
        conv_id: conversationId,
        tokens: inputTokens + outputTokens,
      }),
    })
    .eq('id', conversationId)
}
⚠ Token Costs Scale with Conversation Length
πŸ“Š Production Insight
Each agent iteration includes the full conversation history β€” costs compound with every turn.
Summarization reduces token count by 80-90% while preserving key context.
Rule: summarize after 8,000 tokens, use gpt-4o-mini for simple tasks, track per-conversation costs.
🎯 Key Takeaway
Token costs are the primary production concern β€” each iteration sends full history to the LLM.
Summarization, model selection, and token budgets control spend β€” implement all three.
Track per-conversation usage in Supabase β€” set hard limits and alert on exceedance.

Deployment: Environment, Monitoring, and Failure Handling

Production deployment requires three additions beyond the development setup: environment variable management (API keys, Supabase credentials), monitoring (token usage, error rates, latency), and failure handling (tool errors, LLM timeouts, rate limits).

The deployment target matters: Vercel has a 10-second timeout for Serverless Functions (300 seconds on Pro plan). Agent loops with multiple tool calls can exceed this. For long-running agents, use Vercel's Edge Runtime (no timeout limit) or deploy the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service).

io.thecodeforge.ai-agent.deployment.ts Β· TYPESCRIPT
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
// ============================================
// Deployment Configuration
// ============================================

// ---- Environment Variables (.env.local) ----
// NEVER commit API keys to the repository

// OPENAI_API_KEY=sk-...
// SUPABASE_URL=https://your-project.supabase.co
// SUPABASE_ANON_KEY=eyJ...
// SUPABASE_SERVICE_ROLE_KEY=eyJ... (for server-side only)
// NODE_ENV=production

// ---- next.config.ts ----
// Configure for AI workloads

import type { NextConfig } from 'next'

const nextConfig: NextConfig = {
  // Increase body size limit for conversation history
  experimental: {
    serverActions: {
      bodySizeLimit: '2mb',
    },
  },
  // Configure for Vercel deployment
  serverExternalPackages: ['@langchain/openai', '@langchain/core'],
}

export default nextConfig

// ---- Rate Limiting (Upstash Redis β€” works in serverless) ----
// In-memory Map dies in serverless β€” use Redis

import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN! })

const ratelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
})

export async function rateLimitMiddleware(request: NextRequest) {
  if (request.nextUrl.pathname.startsWith('/api/chat')) {
    const ip = request.headers.get('x-forwarded-for') ?? 'unknown'
    const { success } = await ratelimit.limit(ip)
    if (!success) {
      return NextResponse.json(
        { error: 'Rate limit exceeded. Please wait before sending another message.' },
        { status: 429 }
      )
    }
  }
  return NextResponse.next()
}

// ---- Middleware: Rate Limiting ----
// Prevent abuse by limiting requests per user
export const config = {
  matcher: ['/api/:path*'],
}

// ---- Error Handling Wrapper ----
// Catch and log agent errors for observability
export async function runAgentWithErrorHandling(
  conversationId: string,
  input: string
) {
  const startTime = Date.now()

  try {
    const agent = await createAgent(conversationId)
    const result = await agent.invoke({ input })

    // Log success
    await logAgentRun({
      conversationId,
      status: 'success',
      duration: Date.now() - startTime,
      steps: result.intermediateSteps?.length ?? 0,
      input,
      output: result.output,
    })

    return result
  } catch (error) {
    // Log failure
    await logAgentRun({
      conversationId,
      status: 'error',
      duration: Date.now() - startTime,
      error: error instanceof Error ? error.message : 'Unknown error',
      input,
    })

    // Return user-friendly error
    if (error instanceof Error && error.message.includes('timeout')) {
      return {
        output: 'The request took too long to process. Please try a simpler question or try again later.',
      }
    }

    if (error instanceof Error && error.message.includes('rate_limit')) {
      return {
        output: 'The AI service is currently experiencing high demand. Please wait a moment and try again.',
      }
    }

    return {
      output: 'I encountered an error processing your request. A support ticket has been created automatically.',
    }
  }
}

async function logAgentRun(data: Record<string, unknown>) {
  const supabase = await createClient()
  await supabase.from('agent_runs').insert(data)
}

// ---- Health Check Endpoint ----
// File: app/api/health/route.ts
export async function GET() {
  const checks = {
    openai: false,
    supabase: false,
  }

  // Check OpenAI
  try {
    const response = await fetch('https://api.openai.com/v1/models', {
      headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
    })
    checks.openai = response.ok
  } catch {
    checks.openai = false
  }

  // Check Supabase
  try {
    const supabase = await createClient()
    const { error } = await supabase.from('conversations').select('id').limit(1)
    checks.supabase = !error
  } catch {
    checks.supabase = false
  }

  const healthy = Object.values(checks).every(Boolean)

  return Response.json(
    { status: healthy ? 'healthy' : 'degraded', checks },
    { status: healthy ? 200 : 503 }
  )
}
πŸ”₯Deployment Considerations
  • Vercel Serverless Functions have a 10-second timeout (300s on Pro) β€” agent loops may exceed this
  • Use Edge Runtime for long-running agents β€” no timeout limit, but no Node.js APIs
  • Rate limiting prevents abuse β€” 10 messages per minute per user is a reasonable default
  • Health checks verify OpenAI and Supabase connectivity β€” return 503 if either is down
  • Log every agent run with duration, steps, and errors β€” observability is critical for debugging
πŸ“Š Production Insight
Vercel Serverless Functions have a 10-second timeout β€” agent loops with many tools may exceed it.
Rate limiting prevents abuse β€” 10 messages per minute per user is a reasonable baseline.
Rule: log every agent run with duration and error details β€” observability is critical for production.
🎯 Key Takeaway
Deployment requires rate limiting, health checks, and error handling β€” not just the agent code.
Vercel timeout limits constrain agent loop length β€” consider Edge Runtime or separate services.
Log every agent run β€” duration, steps, errors, and token usage for observability.
πŸ—‚ Agent Frameworks Compared
LangChain vs alternatives for building AI agents
FrameworkLanguageAgent TypeStreamingTool EcosystemProduction ReadyBest For
LangChainPython, JS/TSReAct, Tool CallingYes (with streamEvents)LargeYes*Complex multi-tool agents, RAG pipelines
Vercel AI SDKJS/TSOpenAI FunctionsYes (native)SmallYesSimple chat agents, streaming-first apps
CrewAIPythonRole-based multi-agentLimitedMediumGrowingMulti-agent collaboration, research tasks
AutoGenPythonConversational multi-agentYesMediumYesMulti-agent conversations, code generation
Direct OpenAI APIAnyFunction callingYesNone (manual)YesSimple single-tool agents, full control

🎯 Key Takeaways

  • An agent decides tool sequences dynamically β€” unlike chains which follow fixed steps
  • maxSteps on the agent prevents infinite loops β€” always set it to 10 in production
  • Supabase stores conversation history and vector embeddings β€” the database is the agent's memory
  • Conversation summarization reduces token costs by 80-90% while preserving context
  • Tool descriptions guide the LLM's tool selection β€” write them like instructions, not documentation
  • Validate tool inputs with Zod β€” LLMs hallucinate invalid arguments in multi-step workflows

⚠ Common Mistakes to Avoid

    βœ•No maxSteps on the agent
    Symptom

    Agent enters infinite tool-calling loop. Consumes millions of tokens and thousands of dollars in API credits within hours. The agent keeps calling the same tool with slightly different inputs, never reaching a conclusion.

    Fix

    Set maxSteps to 10 on createToolCallingAgent. Add deduplication logic to detect repeated tool results. Set OpenAI usage limits as a safety net. Monitor per-conversation token counts.

    βœ•Vague tool descriptions that do not explain when to use the tool
    Symptom

    Agent either never uses the tool (does not know when it is relevant) or uses it incorrectly (applies it to the wrong type of query). Tool selection accuracy drops below 50%.

    Fix

    Write tool descriptions that explain WHEN to use the tool, not just what it does. Include examples of good queries. List what the tool should NOT be used for. Add .describe() to every Zod schema field.

    βœ•No input validation on tool parameters
    Symptom

    Tool receives hallucinated inputs from the LLM β€” missing required fields, wrong types, malformed queries. Tool crashes with unhandled errors, and the agent gets an unclear error message.

    Fix

    Validate all tool inputs with Zod before execution. Return clear, actionable error messages that tell the agent what went wrong and how to retry with valid inputs.

    βœ•Storing conversation history only in application memory
    Symptom

    Conversation history is lost when the serverless function terminates. Users lose context on every page refresh. Multi-turn conversations do not work across requests.

    Fix

    Store conversation history in Supabase PostgreSQL. Load history from the database on every request. Use a custom history class that matches your token-tracking schema.

    βœ•No conversation summarization for long sessions
    Symptom

    Token costs increase linearly with conversation length. A 50-turn conversation consumes 100,000+ tokens per agent iteration. Monthly API costs exceed budget by 5-10x.

    Fix

    Implement conversation summarization when history exceeds 8,000 tokens. Use gpt-4o-mini for summarization (cheaper). Replace old messages with a summary in the agent context.

    βœ•Using gpt-4o for all tasks including simple lookups
    Symptom

    Simple tool calls (order status lookups, basic searches) cost 15x more than necessary. Monthly API costs are dominated by trivial operations that do not require advanced reasoning.

    Fix

    Use gpt-4o-mini for simple tasks (order lookups, basic searches) and gpt-4o for complex reasoning (multi-tool workflows, refund calculations). Route tasks to the appropriate model based on complexity.

    βœ•No rate limiting on the chat API endpoint
    Symptom

    A single user or bot sends 1,000 messages per minute, exhausting the OpenAI rate limit and causing errors for all users. API costs spike unexpectedly.

    Fix

    Add rate limiting middleware using Upstash Redis (sliding window) β€” 10 messages per minute per user is a reasonable default.

    βœ•Not logging agent runs for observability
    Symptom

    When the agent produces a wrong answer or enters a loop, there is no way to debug what happened. No visibility into which tools were called, what inputs were used, or how many steps were taken.

    Fix

    Log every agent run with: conversation ID, input, output, intermediate steps, duration, token count, and error details. Use these logs to identify patterns in agent failures.

Interview Questions on This Topic

  • QWhat is the difference between an AI agent and a chain in LangChain?Mid-levelReveal
    A chain follows a fixed sequence of steps β€” retrieve context, format prompt, generate response. The sequence is defined at build time and does not change based on the input. An agent decides the sequence dynamically. It uses an LLM to reason about which tool to call, calls the tool, observes the result, and decides the next step. The sequence is determined at runtime based on the input and intermediate results. The trade-off: agents are more flexible (they can handle unexpected queries by choosing different tools) but more expensive (each step is a separate LLM call) and harder to debug (the sequence is non-deterministic). Chains are cheaper, faster, and predictable but limited to predefined workflows.
  • QHow do you prevent an AI agent from entering an infinite loop?SeniorReveal
    Three layers of protection: 1. maxSteps on the agent β€” set to 10 for production. This is the hard stop that prevents infinite loops regardless of the agent's behavior. 2. Prompt-level instructions β€” tell the agent what to do when tools return no new information. Example: "If the search returns the same results twice, synthesize an answer from available information instead of searching again." 3. Tool-level deduplication β€” detect when the same tool is called with the same inputs and return a message like "This query was already searched. The results have not changed. Please answer based on the available information." Additionally, set OpenAI usage limits as a financial safety net, and monitor per-conversation token counts to detect loops early.
  • QHow do you manage conversation memory in a stateless serverless environment?Mid-levelReveal
    Serverless functions are stateless β€” they terminate after each request and do not retain memory. To persist conversation history across requests: 1. Store every message in a database (Supabase PostgreSQL). Each message includes the conversation ID, role, content, and timestamp. 2. On each request, load the conversation history from the database and pass it to the agent as context. 3. Use a custom history class that matches your token-tracking schema. The key insight: the database is the memory, not the application server. Each request loads the memory, processes the message, and writes the result back.
  • QHow do you control token costs for an AI agent in production?SeniorReveal
    Four strategies: 1. Conversation summarization β€” when history exceeds 8,000 tokens, summarize old messages with a cheaper model (gpt-4o-mini). This reduces token count by 80-90% while preserving context. 2. Model selection β€” use gpt-4o-mini for simple tasks (order lookups, basic searches at $0.15/M input tokens) and gpt-4o for complex reasoning ($2.50/M input tokens). Route based on task complexity. 3. Token budgets β€” set a per-conversation limit (e.g., 50,000 tokens). When exceeded, summarize and continue or stop processing. 4. maxSteps β€” limit the number of agent steps per turn. Each step is a separate LLM call with the full history. Fewer steps means fewer tokens. Additionally, track per-conversation token usage in the database, set daily/monthly budgets, and alert when spending exceeds thresholds.
  • QWhat is the role of Supabase in an AI agent architecture?JuniorReveal
    Supabase serves three roles: 1. Conversation storage β€” PostgreSQL stores conversation history (messages table) and conversation metadata (conversations table). This enables session persistence across requests in a stateless serverless environment. 2. Vector search β€” pgvector stores document embeddings for the knowledge base. The match_documents RPC function performs cosine similarity search, enabling the agent to retrieve relevant documents based on semantic meaning rather than keyword matching. 3. Row Level Security β€” RLS policies ensure users can only access their own conversations. This provides authorization at the database level, independent of the application code. Additionally, Supabase can store conversation summaries (for cost control), usage logs (for billing and monitoring), and agent run logs (for observability).

Frequently Asked Questions

Can I use a different LLM instead of OpenAI with LangChain?

Yes. LangChain supports many LLM providers: Anthropic (ChatAnthropic), Google (ChatGoogleGenerativeAI), Azure OpenAI (AzureChatOpenAI), Ollama (ChatOllama for local models), and any OpenAI-compatible API. The agent framework works the same regardless of the LLM β€” only the model initialization changes. For function calling (tool use), ensure the model supports it β€” OpenAI and Anthropic models have native support, while others may require prompt-based tool calling.

How do I test an AI agent in my CI/CD pipeline?

Test at three levels: unit test individual tools (mock external calls, verify output format), integration test the agent with a fixed input (verify it calls the expected tools and produces a correct answer), and end-to-end test the API endpoint (verify streaming, authentication, and error handling). For deterministic tests, use temperature=0 and record/replay LLM responses with tools like VCR or Polly. Mock the Supabase client for database operations.

How do I handle multi-agent architectures where agents delegate to each other?

LangChain supports agent delegation through the agent's tools β€” one agent can be wrapped as a tool for another agent. The supervisor agent decides which specialist agent to call (e.g., a billing agent, a technical support agent) and passes the relevant context. CrewAI and AutoGen provide higher-level abstractions for multi-agent collaboration with role definitions and conversation patterns.

What is the difference between gpt-4o and gpt-4o-mini for agent tasks?

gpt-4o is better at complex reasoning, multi-step planning, and understanding nuanced tool descriptions. gpt-4o-mini is faster and 15x cheaper but may struggle with complex tool selection or multi-step workflows. Use gpt-4o-mini for simple tasks (single tool calls, order lookups, basic searches) and gpt-4o for complex tasks (multi-tool workflows, refund calculations). The cost difference is significant: $0.15/M vs $2.50/M input tokens.

How do I deploy an AI agent that exceeds Vercel's serverless timeout?

Three options: use Vercel Edge Runtime (no timeout limit, but no Node.js APIs β€” works for pure API calls), move the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service on ECS/Fargate with no timeout), or implement a queue-based architecture (user submits a message, a worker processes it asynchronously, the client polls or receives a webhook when done). The queue approach is the most scalable but adds complexity.

πŸ”₯
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PrevioustRPC v11 + Next.js 16: Complete Setup and Best PracticesNext β†’Next.js 16 Caching Strategies Explained: The 2026 Guide
Forged with πŸ”₯ at TheCodeForge.io β€” Where Developers Are Forged