How to Build an AI Agent with Next.js, LangChain & Supabase
- An agent decides tool sequences dynamically β unlike chains which follow fixed steps
- maxSteps on the agent prevents infinite loops β always set it to 10 in production
- Supabase stores conversation history and vector embeddings β the database is the agent's memory
- An AI Agent uses an LLM to decide which tools to call and in what order β unlike a chain, it reasons about actions
- LangChain provides the agent framework β createToolCallingAgent + RunnableWithMessageHistory runs the ReAct loop (reason, act, observe)
- Supabase stores conversation history and vector embeddings for long-term memory
- Next.js 16 Server Actions and Route Handlers stream responses via Vercel AI SDK (streamText)
- Token costs scale with conversation length β truncate or summarize history to control spend
- Biggest mistake: no tool guardrails β agents can call tools infinitely without maxSteps or deduplication
Agent not calling tools
grep -rn 'tool\|Tool\|description' lib/agent/ --include='*.ts' | head -20cat lib/agent/tools.ts | head -60Supabase vector search returning empty
curl -s "${SUPABASE_URL}/rest/v1/documents?select=count" -H "apikey: ${SUPABASE_ANON_KEY}" | jqcurl -s "${SUPABASE_URL}/rest/v1/rpc/match_documents" -X POST -H "apikey: ${SUPABASE_ANON_KEY}" -H "Content-Type: application/json" -d '{"query_embedding":[0.1,0.2],"match_count":5}' | jqStreaming not working
grep -rn 'streamText\|toDataStreamResponse' app/api/ --include='*.ts' | head -10curl -s -N http://localhost:3000/api/chat -X POST -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"hello"}]}' | head -5Token costs spiking
grep -rn 'tokenCount\|token_count\|usage' lib/agent/ --include='*.ts' | head -10psql "${DATABASE_URL}" -c "SELECT conversation_id, SUM(token_count) as total FROM messages GROUP BY conversation_id ORDER BY total DESC LIMIT 10"Production Incident
Production Debug GuideDiagnose agent loops, tool failures, and memory issues
Building an AI agent requires orchestrating four components: the LLM (reasoning engine), tools (external capabilities), memory (conversation history), and a runtime loop (agent executor). LangChain provides the agent framework with ReAct prompting β the model reasons about what to do, selects a tool, observes the result, and decides the next step.
Supabase serves two roles: PostgreSQL stores conversation history for session continuity, and pgvector stores embeddings for semantic search over past interactions. Next.js 16 provides the API layer β Server Actions and Route Handlers process requests, and the Vercel AI SDK streams token-by-token responses to the client.
The production challenges are cost control (token usage scales with history length), latency (multi-step agent loops add sequential delay), reliability (tools can fail, LLMs can hallucinate tool inputs), and observability (debugging why an agent chose a specific action). This guide covers the complete implementation with production patterns for each.
Architecture: Agent, Tools, Memory, and Runtime
An AI agent has four components that work together in a loop. The LLM is the reasoning engine β it reads the conversation history and tool descriptions, decides which tool to call, and interprets the result. Tools are external functions the agent can invoke execution, API calls. Memory stores conversation history so the agent maintains context across turns. The runtime (AgentExecutor) orchestrates the loop: send context to the LLM, parse the tool call, execute the tool, append the result, repeat.
The key difference between an agent and a chain: a chain follows a fixed sequence of steps (retrieve -> format -> generate). An agent decides the sequence dynamically β it might search, then calculate, then search again, then answer. This flexibility comes at a cost: each step is a separate LLM call, adding latency and token usage.
LangChain's ReAct agent type implements this pattern. The prompt instructs the LLM to Think (reason about what to do), Act (call a tool with specific inputs), and Observe (read the tool result). The loop continues until the agent decides it has enough information to answer, or maxSteps is reached.
// ============================================ // AI Agent Architecture β Core Components (2026 LangChain 0.3+) // ============================================ // ---- Component 1: LLM (Reasoning Engine) ---- // The LLM decides which tools to call and interprets results import { ChatOpenAI } from '@langchain/openai' const llm = new ChatOpenAI({ modelName: 'gpt-4o', temperature: 0, // Deterministic for tool-calling β reduces hallucination openAIApiKey: process.env.OPENAI_API_KEY, }) // ---- Component 2: Tools (External Capabilities) ---- // Each tool has a name, description, schema, and execution function import { DynamicStructuredTool } from '@langchain/core/tools' import { z } from 'zod' const searchKnowledgeBase = new DynamicStructuredTool({ name: 'search_knowledge_base', description: 'Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.', schema: z.object({ query: z.string().describe('The search query β use keywords from the user question'), category: z.enum(['product', 'pricing', 'troubleshooting', 'general']).optional() .describe('Filter by category if the topic is clear'), }), func: async ({ query, category }) => { // Search Supabase vector store const results = await searchDocuments(query, category) if (results.length === 0) { return 'No results found. Try a different search query or answer from general knowledge.' } return results.map((r) => `[${r.title}]: ${r.content}`).join('\n\n') }, }) const getOrderStatus = new DynamicStructuredTool({ name: 'get_order_status', description: 'Look up the status of a customer order by order ID. Returns shipping status, estimated delivery, and tracking number.', schema: z.object({ orderId: z.string().describe('The order ID β format: ORD-XXXXX'), }), func: async ({ orderId }) => { const order = await getOrderFromDatabase(orderId) if (!order) { return `Order ${orderId} not found. Ask the user to verify the order ID.` } return JSON.stringify({ status: order.status, estimatedDelivery: order.estimatedDelivery, trackingNumber: order.trackingNumber, }) }, }) const calculateRefund = new DynamicStructuredTool({ name: 'calculate_refund', description: 'Calculate the refund amount for a return. Considers order total, return reason, and days since purchase.', schema: z.object({ orderId: z.string().describe('The order ID'), returnReason: z.enum(['defective', 'wrong_item', 'changed_mind', 'not_as_described']) .describe('The reason for the return'), }), func: async ({ orderId, returnReason }) => { const order = await getOrderFromDatabase(orderId) if (!order) return 'Order not found.' const daysSincePurchase = Math.floor( (Date.now() - new Date(order.createdAt).getTime()) / (1000 * 60 * 60 * 24) ) let refundPercentage = 1.0 if (returnReason === 'changed_mind' && daysSincePurchase > 30) { refundPercentage = 0.0 } else if (returnReason === 'changed_mind') { refundPercentage = 0.85 } const refundAmount = order.total * refundPercentage return JSON.stringify({ refundAmount, refundPercentage, daysSincePurchase }) }, }) // ---- Component 3: Memory (Conversation History) ---- // Stored in Supabase PostgreSQL for persistence across requests // Custom history class to match your token-tracking schema import { SupabaseChatMessageHistory } from '@langchain/community/stores/message/supabase' function createMemory(conversationId: string) { return new SupabaseChatMessageHistory({ supabaseClient: supabase, tableName: 'messages', sessionId: conversationId, }) } // ---- Component 4: Agent (2026 pattern) ---- // Use createToolCallingAgent + RunnableWithMessageHistory import { createToolCallingAgent } from 'langchain/agents' import { RunnableWithMessageHistory } from '@langchain/core/runnables' import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts' const prompt = ChatPromptTemplate.fromMessages([ ['system', `You are a helpful customer support agent for Acme Corp. You have access to the following tools: - search_knowledge_base: Search the internal knowledge base - get_order_status: Look up order status by order ID - calculate_refund: Calculate refund amounts for returns Rules: 1. Always search the knowledge base before answering product questions 2. If a tool returns no new information, answer from what you know 3. Never make up order statuses β always use get_order_status 4. If you cannot answer, say so and offer to connect with a human agent 5. Maximum 10 tool calls per response β synthesize your answer after that`], new MessagesPlaceholder('chat_history'), ['human', '{input}'], new MessagesPlaceholder('agent_scratchpad'), ]) async function createAgent(conversationId: string) { const tools = [searchKnowledgeBase, getOrderStatus, calculateRefund] const agent = await createToolCallingAgent({ llm, tools, prompt, }) const agentWithHistory = new RunnableWithMessageHistory({ runnable: agent, getMessageHistory: () => createMemory(conversationId), inputMessagesKey: 'input', historyMessagesKey: 'chat_history', }) return agentWithHistory }
- Think: the LLM reads the conversation history and tool descriptions, decides which tool to call
- Act: the agent calls the tool with specific inputs parsed from the LLM output
- Observe: the tool result is appended to the conversation, and the LLM decides the next step
- Each step is a separate LLM call β latency and cost scale with the number of iterations
- maxSteps stops the loop β without it, the agent can run indefinitely and drain API credits
Supabase: Memory, Vector Search, and Conversation Storage
Supabase serves three roles in the agent architecture. PostgreSQL stores conversation history β every message (user and assistant) is persisted for session continuity. pgvector stores document embeddings for semantic search β the agent can retrieve relevant knowledge base entries. The Supabase client provides real-time subscriptions if you want to show agent activity to the user in real time.
The conversation storage pattern: each conversation has a unique ID. Messages are appended to a messages table with the conversation_id as a foreign key. When the agent processes a new message, it loads the conversation history from Supabase and prepends it to the LLM context. This gives the agent memory across turns without storing state in the application server.
Vector search uses pgvector's cosine similarity. Documents are chunked, embedded with OpenAI's text-embedding-3-small model, and stored with their embedding vectors. The match_documents RPC function performs nearest-neighbor search and returns the top-k results above a similarity threshold (use 0.78β0.82 for text-embedding-3-small in production).
-- ============================================ -- Supabase Schema for AI Agent -- ============================================ -- Enable pgvector extension CREATE EXTENSION IF NOT EXISTS vector; -- ---- Conversations Table ---- -- Each conversation has a unique ID and tracks the user CREATE TABLE public.conversations ( id UUID DEFAULT gen_random_uuid() PRIMARY KEY, user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, title TEXT, total_tokens INTEGER DEFAULT 0, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); ALTER TABLE public.conversations ENABLE ROW LEVEL SECURITY; CREATE POLICY "Users can manage their own conversations" ON public.conversations FOR ALL USING (auth.uid() = user_id) WITH CHECK (auth.uid() = user_id); -- ---- Messages Table ---- -- Stores each message in a conversation CREATE TABLE public.messages ( id UUID DEFAULT gen_random_uuid() PRIMARY KEY, conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE, role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system', 'tool')), content TEXT NOT NULL, tool_calls JSONB, -- Stores tool call details for assistant messages tool_call_id TEXT, -- Links tool response to the original call token_count INTEGER DEFAULT 0, created_at TIMESTAMPTZ DEFAULT NOW() ); ALTER TABLE public.messages ENABLE ROW LEVEL SECURITY; CREATE POLICY "Users can manage messages in their conversations" ON public.messages FOR ALL USING ( conversation_id IN ( SELECT id FROM public.conversations WHERE user_id = auth.uid() ) ) WITH CHECK ( conversation_id IN ( SELECT id FROM public.conversations WHERE user_id = auth.uid() ) ); -- Index for fast conversation history retrieval CREATE INDEX idx_messages_conversation_created ON public.messages (conversation_id, created_at ASC); -- ---- Documents Table ---- -- Knowledge base with vector embeddings CREATE TABLE public.documents ( id UUID DEFAULT gen_random_uuid() PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL, category TEXT, -- product, pricing, troubleshooting, general embedding vector(1536), -- text-embedding-3-small produces 1536-dim vectors metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW() ); ALTER TABLE public.documents ENABLE ROW LEVEL SECURITY; CREATE POLICY "Documents are readable by authenticated users" ON public.documents FOR SELECT USING (auth.role() = 'authenticated'); -- Index for vector similarity search CREATE INDEX ON public.documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- ---- Vector Search RPC Function ---- -- Performs nearest-neighbor search using cosine similarity CREATE OR REPLACE FUNCTION match_documents( query_embedding vector(1536), match_count INTEGER DEFAULT 5, match_threshold FLOAT DEFAULT 0.8, -- Production value for text-embedding-3-small filter_category TEXT DEFAULT NULL ) RETURNS TABLE ( id UUID, title TEXT, content TEXT, category TEXT, similarity FLOAT ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT documents.id, documents.title, documents.content, documents.category, 1 - (documents.embedding <=> query_embedding) AS similarity FROM public.documents WHERE (filter_category IS NULL OR documents.category = filter_category) AND 1 - (documents.embedding <=> query_embedding) > match_threshold ORDER BY documents.embedding <=> query_embedding LIMIT match_count; END; $$; -- ---- Conversation Summaries Table ---- -- Stores summarized history for long conversations CREATE TABLE public.conversation_summaries ( id UUID DEFAULT gen_random_uuid() PRIMARY KEY, conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE, summary TEXT NOT NULL, message_count INTEGER NOT NULL, -- Number of messages summarized token_count INTEGER NOT NULL, -- Token count of the original messages created_at TIMESTAMPTZ DEFAULT NOW() ); -- ---- Updated At Trigger ---- CREATE OR REPLACE FUNCTION update_updated_at() RETURNS TRIGGER LANGUAGE plpgsql AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$; CREATE TRIGGER conversations_updated_at BEFORE UPDATE ON public.conversations FOR EACH ROW EXECUTE FUNCTION update_updated_at();
- PostgreSQL stores conversation history β every message persisted for session continuity
- pgvector stores document embeddings β agent retrieves relevant knowledge via semantic search
- Conversation summaries compress long histories β control token costs for extended sessions
- RLS policies ensure users only access their own conversations β security at the database level
- The match_documents RPC function performs nearest-neighbor search with configurable threshold (0.78-0.82 recommended)
Next.js Integration: Streaming API and Server Actions
The Next.js integration has two concerns: the API layer that processes agent requests, and the streaming layer that delivers token-by-token responses to the client. The Vercel AI SDK (ai package) handles both β it provides useChat for the client and streamText/StreamingTextResponse for the server.
The API route receives the user's message, loads conversation history from Supabase, runs the agent, and streams the response. The client uses useChat to manage the message list and display streaming tokens. The key pattern: the Route Handler returns streamText(...).toDataStreamResponse(), which is a streaming response that the client consumes incrementally.
For non-streaming use cases (background processing, webhook-triggered agents), use Server Actions instead. They run the agent synchronously and return the result. Server Actions are simpler but do not support streaming β the user waits for the full response.
// ============================================ // Client Component β Chat Interface with Streaming // ============================================ // File: components/chat-interface.tsx 'use client' import { useChat } from 'ai/react' import { useState } from 'react' import { Button } from '@/components/ui/button' import { Input } from '@/components/ui/input' export function ChatInterface({ conversationId }: { conversationId?: string }) { const [activeConversationId, setActiveConversationId] = useState(conversationId) const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({ api: '/api/chat', body: { conversationId: activeConversationId }, onFinish: (message) => { // Extract conversation ID from response headers // (set by the server in StreamingTextResponse headers) }, onError: (error) => { console.error('Chat error:', error) }, }) return ( <div className="flex flex-col h-[600px] max-w-2xl mx-auto"> {/* Messages */} <div className="flex-1 overflow-y-auto space-y-4 p-4"> {messages.map((message) => ( <div key={message.id} className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`} > <div className={`max-w-[80%] rounded-lg px-4 py-2 ${ message.role === 'user' ? 'bg-primary text-primary-foreground' : 'bg-muted' }`} > <p className="text-sm whitespace-pre-wrap">{message.content}</p> </div> </div> ))} {isLoading && ( <div className="flex justify-start"> <div className="bg-muted rounded-lg px-4 py-2"> <div className="flex space-x-1"> <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" /> <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-100" /> <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-200" /> </div> </div> </div> )} </div> {/* Input */} <form onSubmit={handleSubmit} className="flex gap-2 p-4 border-t"> <Input value={input} onChange={handleInputChange} placeholder="Ask a question..." disabled={isLoading} className="flex-1" /> <Button type="submit" disabled={isLoading}> Send </Button> </form> </div> ) } // ============================================ // Server Action β Non-Streaming Agent (Background Processing) // ============================================ // File: app/actions/agent.ts 'use server' import { createAgent } from '@/lib/agent/factory' import { createClient } from '@/lib/supabase/server' export async function processAgentMessage( conversationId: string, message: string ) { const supabase = await createClient() const { data: { user } } = await supabase.auth.getUser() if (!user) { throw new Error('Unauthorized') } // Store user message await supabase.from('messages').insert({ conversation_id: conversationId, role: 'user', content: message, }) // Run agent (non-streaming) const agent = await createAgent(conversationId) const result = await agent.invoke({ input: message }) // Store assistant response await supabase.from('messages').insert({ conversation_id: conversationId, role: 'assistant', content: result.output, }) return { answer: result.output, steps: result.intermediateSteps?.length ?? 0, } }
Tools: Building and Validating Agent Capabilities
Tools are the agent's external capabilities β each tool is a function with a name, description, and input schema. The LLM reads the tool descriptions to decide which tool to call, and parses the input schema to generate valid arguments. The quality of the description directly affects the agent's ability to use the tool correctly.
The critical pattern: validate tool inputs with Zod before execution. LLMs can hallucinate invalid inputs β missing required fields, wrong types, or malformed queries. Zod validation catches these before the tool executes, returning a clear error message that the agent can learn from.
Tool design principles: descriptions should be specific (not just "search the database"), input schemas should use .describe() on every field (the LLM reads these descriptions), and error messages should be actionable (tell the agent what went wrong and how to fix it).
// ============================================ // Tool Design Patterns // ============================================ import { DynamicStructuredTool } from '@langchain/core/tools' import { z } from 'zod' // ---- Good Tool: Clear description, validated inputs, actionable errors + deduplication ---- export const searchKnowledgeBase = new DynamicStructuredTool({ name: 'search_knowledge_base', description: `Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting. Do NOT use this tool for: - Order status queries (use get_order_status instead) - Calculations (use calculate tool instead) - Questions about the current date or time`, schema: z.object({ query: z.string() .min(3, 'Query must be at least 3 characters') .max(200, 'Query must be under 200 characters') .describe('The search query β use specific keywords from the user question. Avoid full sentences.'), category: z.enum(['product', 'pricing', 'troubleshooting', 'general']) .optional() .describe('Filter by category if the topic is clear from the question'), maxResults: z.number() .min(1) .max(10) .default(5) .describe('Maximum number of results to return β use 3 for focused answers, 5 for comprehensive'), }), func: async ({ query, category, maxResults }) => { try { // Deduplication guardrail const resultHash = JSON.stringify({ query, category }).slice(0, 100) if (globalThis.seenResults?.has(resultHash)) { return "You've already seen this result. Answer from context or ask for clarification." } globalThis.seenResults = globalThis.seenResults || new Set() globalThis.seenResults.add(resultHash) // Generate embedding for the query const embedding = await generateEmbedding(query) // Search Supabase vector store const { data, error } = await supabase.rpc('match_documents', { query_embedding: embedding, match_count: maxResults, match_threshold: 0.8, // Production value filter_category: category ?? null, }) if (error) { return `Search failed: ${error.message}. Try a simpler query or remove the category filter.` } if (!data || data.length === 0) { return `No results found for "${query}". Try: 1. Using different keywords 2. Removing the category filter 3. Broadening the search terms` } return data.map((doc: any, i: number) => `[Result ${i + 1}] (similarity: ${doc.similarity.toFixed(2)}) Title: ${doc.title} Content: ${doc.content.slice(0, 500)}...` ).join('\n\n') } catch (err) { return `Search error: ${err instanceof Error ? err.message : 'Unknown error'}. The user should try again or contact support.` } }, }) // ---- Good Tool: Input validation with meaningful error messages ---- export const createTicket = new DynamicStructuredTool({ name: 'create_support_ticket', description: 'Create a support ticket when the agent cannot resolve the issue. Use this as a last resort after exhausting available tools.', schema: z.object({ subject: z.string() .min(10, 'Subject must be at least 10 characters β provide a clear summary') .max(200) .describe('Brief subject line summarizing the issue'), description: z.string() .min(50, 'Description must be at least 50 characters β include what was tried and what failed') .max(2000) .describe('Detailed description including what the user tried and what went wrong'), priority: z.enum(['low', 'medium', 'high', 'urgent']) .describe('Priority based on impact: low = question, medium = minor issue, high = blocked, urgent = production down'), category: z.enum(['billing', 'technical', 'account', 'feature_request']) .describe('Category of the issue'), }), func: async ({ subject, description, priority, category }) => { // Validate business rules if (priority === 'urgent' && category !== 'technical') { return 'Urgent priority is only available for technical issues. Please use high priority instead.' } const ticket = await createTicketInDatabase({ subject, description, priority, category }) return `Ticket created successfully. Ticket ID: ${ticket.id} Priority: ${priority} Expected response time: ${getResponseTime(priority)} Tell the user: "I've created a support ticket (ID: ${ticket.id}) for you. A team member will respond within ${getResponseTime(priority)}."` }, }) // ---- Helper: Generate embedding ---- async function generateEmbedding(text: string): Promise<number[]> { const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text, }) return response.data[0].embedding } function getResponseTime(priority: string): string { const times: Record<string, string> = { low: '48 hours', medium: '24 hours', high: '4 hours', urgent: '1 hour', } return times[priority] ?? '24 hours' }
- Descriptions must explain WHEN to use the tool β not just what it does
- Every input field needs .describe() β the LLM reads these to generate valid arguments
- Validate inputs with Zod before execution β LLMs can hallucinate invalid inputs
- Error messages must be actionable β tell the agent what went wrong and how to retry
- Return formatted text, not raw JSON β LLMs parse natural language better than JSON
Cost Control: Token Management and Conversation Summarization
Token costs are the primary production concern for AI agents. Each agent iteration is a separate LLM call that includes the full conversation history, tool descriptions, and the agent's reasoning. A 20-turn conversation with 10 tool calls per turn can consume 100,000+ tokens β costing $1-10 depending on the model.
Three strategies control costs: history truncation (keep only the last N messages), conversation summarization (replace old messages with a summary), and model selection (use gpt-4o-mini for simple tasks, gpt-4o for complex reasoning). Summarization is the most effective β it preserves context while reducing token count by 80-90%.
// ============================================ // Cost Control: Token Management // ============================================ import { ChatOpenAI } from '@langchain/openai' import { createClient } from '@/lib/supabase/server' // ---- Strategy 1: Conversation Summarization ---- // Summarize old messages when the conversation gets too long // Reduces token count by 80-90% while preserving context export async function getConversationHistory(conversationId: string, maxTokens: number = 8000) { const supabase = await createClient() // Get the latest summary (if any) const { data: summary } = await supabase .from('conversation_summaries') .select('summary, message_count') .eq('conversation_id', conversationId) .order('created_at', { ascending: false }) .limit(1) .single() // Get messages after the summary const offset = summary?.message_count ?? 0 const { data: recentMessages } = await supabase .from('messages') .select('role, content, token_count') .eq('conversation_id', conversationId) .order('created_at', { ascending: true }) .range(offset, offset + 49) if (!recentMessages) return [] // Build the history const history: { role: string; content: string }[] = [] // Add summary as a system message if it exists if (summary) { history.push({ role: 'system', content: `[Previous conversation summary]: ${summary.summary}`, }) } // Add recent messages let totalTokens = summary ? estimateTokens(summary.summary) : 0 for (const msg of recentMessages) { if (totalTokens + (msg.token_count ?? 0) > maxTokens) { break // Stop adding messages when we hit the token limit } history.push({ role: msg.role, content: msg.content }) totalTokens += msg.token_count ?? 0 } return history } // ---- Strategy 2: Model Selection Based on Task Complexity ---- // Use cheaper models for simple tasks, expensive models for complex reasoning export function selectModel(taskComplexity: 'simple' | 'moderate' | 'complex') { const models = { simple: new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0, maxTokens: 500, // Limit output tokens for simple responses }), moderate: new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0, maxTokens: 1000, }), complex: new ChatOpenAI({ modelName: 'gpt-4o', temperature: 0, maxTokens: 2000, }), } return models[taskComplexity] } // ---- Strategy 3: Token Budget Per Conversation ---- // Track and enforce token limits per conversation export async function checkTokenBudget(conversationId: string): Promise<boolean> { const supabase = await createClient() const MAX_TOKENS_PER_CONVERSATION = 50000 const { data: conversation } = await supabase .from('conversations') .select('total_tokens') .eq('id', conversationId) .single() if (!conversation) return true if (conversation.total_tokens >= MAX_TOKENS_PER_CONVERSATION) { // Summarize and continue await summarizeConversation(conversationId) return true } return true } // ---- Cost Estimation ---- // Estimate cost before running the agent const MODEL_COSTS: Record<string, { input: number; output: number }> = { 'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 }, 'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 }, 'text-embedding-3-small': { input: 0.02 / 1_000_000, output: 0 }, } export function estimateCost( modelName: string, inputTokens: number, outputTokens: number ): number { const costs = MODEL_COSTS[modelName] ?? MODEL_COSTS['gpt-4o'] return (inputTokens * costs.input) + (outputTokens * costs.output) } // ---- Usage Tracking ---- // Store token usage per conversation for billing and monitoring export async function trackUsage( conversationId: string, modelName: string, inputTokens: number, outputTokens: number ) { const supabase = await createClient() const cost = estimateCost(modelName, inputTokens, outputTokens) await supabase.from('usage_logs').insert({ conversation_id: conversationId, model: modelName, input_tokens: inputTokens, output_tokens: outputTokens, cost, }) // Update conversation total await supabase .from('conversations') .update({ total_tokens: supabase.rpc('increment_conversation_tokens', { conv_id: conversationId, tokens: inputTokens + outputTokens, }), }) .eq('id', conversationId) }
Deployment: Environment, Monitoring, and Failure Handling
Production deployment requires three additions beyond the development setup: environment variable management (API keys, Supabase credentials), monitoring (token usage, error rates, latency), and failure handling (tool errors, LLM timeouts, rate limits).
The deployment target matters: Vercel has a 10-second timeout for Serverless Functions (300 seconds on Pro plan). Agent loops with multiple tool calls can exceed this. For long-running agents, use Vercel's Edge Runtime (no timeout limit) or deploy the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service).
// ============================================ // Deployment Configuration // ============================================ // ---- Environment Variables (.env.local) ---- // NEVER commit API keys to the repository // OPENAI_API_KEY=sk-... // SUPABASE_URL=https://your-project.supabase.co // SUPABASE_ANON_KEY=eyJ... // SUPABASE_SERVICE_ROLE_KEY=eyJ... (for server-side only) // NODE_ENV=production // ---- next.config.ts ---- // Configure for AI workloads import type { NextConfig } from 'next' const nextConfig: NextConfig = { // Increase body size limit for conversation history experimental: { serverActions: { bodySizeLimit: '2mb', }, }, // Configure for Vercel deployment serverExternalPackages: ['@langchain/openai', '@langchain/core'], } export default nextConfig // ---- Rate Limiting (Upstash Redis β works in serverless) ---- // In-memory Map dies in serverless β use Redis import { Ratelimit } from '@upstash/ratelimit' import { Redis } from '@upstash/redis' const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN! }) const ratelimit = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute }) export async function rateLimitMiddleware(request: NextRequest) { if (request.nextUrl.pathname.startsWith('/api/chat')) { const ip = request.headers.get('x-forwarded-for') ?? 'unknown' const { success } = await ratelimit.limit(ip) if (!success) { return NextResponse.json( { error: 'Rate limit exceeded. Please wait before sending another message.' }, { status: 429 } ) } } return NextResponse.next() } // ---- Middleware: Rate Limiting ---- // Prevent abuse by limiting requests per user export const config = { matcher: ['/api/:path*'], } // ---- Error Handling Wrapper ---- // Catch and log agent errors for observability export async function runAgentWithErrorHandling( conversationId: string, input: string ) { const startTime = Date.now() try { const agent = await createAgent(conversationId) const result = await agent.invoke({ input }) // Log success await logAgentRun({ conversationId, status: 'success', duration: Date.now() - startTime, steps: result.intermediateSteps?.length ?? 0, input, output: result.output, }) return result } catch (error) { // Log failure await logAgentRun({ conversationId, status: 'error', duration: Date.now() - startTime, error: error instanceof Error ? error.message : 'Unknown error', input, }) // Return user-friendly error if (error instanceof Error && error.message.includes('timeout')) { return { output: 'The request took too long to process. Please try a simpler question or try again later.', } } if (error instanceof Error && error.message.includes('rate_limit')) { return { output: 'The AI service is currently experiencing high demand. Please wait a moment and try again.', } } return { output: 'I encountered an error processing your request. A support ticket has been created automatically.', } } } async function logAgentRun(data: Record<string, unknown>) { const supabase = await createClient() await supabase.from('agent_runs').insert(data) } // ---- Health Check Endpoint ---- // File: app/api/health/route.ts export async function GET() { const checks = { openai: false, supabase: false, } // Check OpenAI try { const response = await fetch('https://api.openai.com/v1/models', { headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` }, }) checks.openai = response.ok } catch { checks.openai = false } // Check Supabase try { const supabase = await createClient() const { error } = await supabase.from('conversations').select('id').limit(1) checks.supabase = !error } catch { checks.supabase = false } const healthy = Object.values(checks).every(Boolean) return Response.json( { status: healthy ? 'healthy' : 'degraded', checks }, { status: healthy ? 200 : 503 } ) }
- Vercel Serverless Functions have a 10-second timeout (300s on Pro) β agent loops may exceed this
- Use Edge Runtime for long-running agents β no timeout limit, but no Node.js APIs
- Rate limiting prevents abuse β 10 messages per minute per user is a reasonable default
- Health checks verify OpenAI and Supabase connectivity β return 503 if either is down
- Log every agent run with duration, steps, and errors β observability is critical for debugging
| Framework | Language | Agent Type | Streaming | Tool Ecosystem | Production Ready | Best For |
|---|---|---|---|---|---|---|
| LangChain | Python, JS/TS | ReAct, Tool Calling | Yes (with streamEvents) | Large | Yes* | Complex multi-tool agents, RAG pipelines |
| Vercel AI SDK | JS/TS | OpenAI Functions | Yes (native) | Small | Yes | Simple chat agents, streaming-first apps |
| CrewAI | Python | Role-based multi-agent | Limited | Medium | Growing | Multi-agent collaboration, research tasks |
| AutoGen | Python | Conversational multi-agent | Yes | Medium | Yes | Multi-agent conversations, code generation |
| Direct OpenAI API | Any | Function calling | Yes | None (manual) | Yes | Simple single-tool agents, full control |
π― Key Takeaways
- An agent decides tool sequences dynamically β unlike chains which follow fixed steps
- maxSteps on the agent prevents infinite loops β always set it to 10 in production
- Supabase stores conversation history and vector embeddings β the database is the agent's memory
- Conversation summarization reduces token costs by 80-90% while preserving context
- Tool descriptions guide the LLM's tool selection β write them like instructions, not documentation
- Validate tool inputs with Zod β LLMs hallucinate invalid arguments in multi-step workflows
β Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between an AI agent and a chain in LangChain?Mid-levelReveal
- QHow do you prevent an AI agent from entering an infinite loop?SeniorReveal
- QHow do you manage conversation memory in a stateless serverless environment?Mid-levelReveal
- QHow do you control token costs for an AI agent in production?SeniorReveal
- QWhat is the role of Supabase in an AI agent architecture?JuniorReveal
Frequently Asked Questions
Can I use a different LLM instead of OpenAI with LangChain?
Yes. LangChain supports many LLM providers: Anthropic (ChatAnthropic), Google (ChatGoogleGenerativeAI), Azure OpenAI (AzureChatOpenAI), Ollama (ChatOllama for local models), and any OpenAI-compatible API. The agent framework works the same regardless of the LLM β only the model initialization changes. For function calling (tool use), ensure the model supports it β OpenAI and Anthropic models have native support, while others may require prompt-based tool calling.
How do I test an AI agent in my CI/CD pipeline?
Test at three levels: unit test individual tools (mock external calls, verify output format), integration test the agent with a fixed input (verify it calls the expected tools and produces a correct answer), and end-to-end test the API endpoint (verify streaming, authentication, and error handling). For deterministic tests, use temperature=0 and record/replay LLM responses with tools like VCR or Polly. Mock the Supabase client for database operations.
How do I handle multi-agent architectures where agents delegate to each other?
LangChain supports agent delegation through the agent's tools β one agent can be wrapped as a tool for another agent. The supervisor agent decides which specialist agent to call (e.g., a billing agent, a technical support agent) and passes the relevant context. CrewAI and AutoGen provide higher-level abstractions for multi-agent collaboration with role definitions and conversation patterns.
What is the difference between gpt-4o and gpt-4o-mini for agent tasks?
gpt-4o is better at complex reasoning, multi-step planning, and understanding nuanced tool descriptions. gpt-4o-mini is faster and 15x cheaper but may struggle with complex tool selection or multi-step workflows. Use gpt-4o-mini for simple tasks (single tool calls, order lookups, basic searches) and gpt-4o for complex tasks (multi-tool workflows, refund calculations). The cost difference is significant: $0.15/M vs $2.50/M input tokens.
How do I deploy an AI agent that exceeds Vercel's serverless timeout?
Three options: use Vercel Edge Runtime (no timeout limit, but no Node.js APIs β works for pure API calls), move the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service on ECS/Fargate with no timeout), or implement a queue-based architecture (user submits a message, a worker processes it asynchronously, the client polls or receives a webhook when done). The queue approach is the most scalable but adds complexity.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.