Advanced 8 min · April 12, 2026

AI Agent Infinite Loops — LangChain + Next.js Cost Control

Q: Can I use a different LLM instead of OpenAI with LangChain?

Yes. LangChain supports many LLM providers: Anthropic (ChatAnthropic), Google (ChatGoogleGenerativeAI), Azure OpenAI (AzureChatOpenAI), Ollama (ChatOllama for local models), and any OpenAI-compatible API. The agent framework works the same regardless of the LLM — only the model initialization changes. For function calling (tool use), ensure the model supports it — OpenAI and Anthropic models have native support, while others may require prompt-based tool calling.

Q: How do I test an AI agent in my CI/CD pipeline?

Test at three levels: unit test individual tools (mock external calls, verify output format), integration test the agent with a fixed input (verify it calls the expected tools and produces a correct answer), and end-to-end test the API endpoint (verify streaming, authentication, and error handling). For deterministic tests, use temperature=0 and record/replay LLM responses with tools like VCR or Polly. Mock the Supabase client for database operations.

Q: How do I handle multi-agent architectures where agents delegate to each other?

LangChain supports agent delegation through the agent's tools — one agent can be wrapped as a tool for another agent. The supervisor agent decides which specialist agent to call (e.g., a billing agent, a technical support agent) and passes the relevant context. CrewAI and AutoGen provide higher-level abstractions for multi-agent collaboration with role definitions and conversation patterns.

Q: What is the difference between gpt-4o and gpt-4o-mini for agent tasks?

gpt-4o is better at complex reasoning, multi-step planning, and understanding nuanced tool descriptions. gpt-4o-mini is faster and 15x cheaper but may struggle with complex tool selection or multi-step workflows. Use gpt-4o-mini for simple tasks (single tool calls, order lookups, basic searches) and gpt-4o for complex tasks (multi-tool workflows, refund calculations). The cost difference is significant: $0.15/M vs $2.50/M input tokens.

Q: How do I deploy an AI agent that exceeds Vercel's serverless timeout?

Three options: use Vercel Edge Runtime (no timeout limit, but no Node.js APIs — works for pure API calls), move the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service on ECS/Fargate with no timeout), or implement a queue-based architecture (user submits a message, a worker processes it asynchronously, the client polls or receives a webhook when done). The queue approach is the most scalable but adds complexity.

One missing maxIterations caused 14,000 tool calls and $4,200 in OpenAI spend.

Naren Founder & Principal Engineer

20+ years shipping production JavaScript and front-end systems at scale. Notes here come from systems that actually shipped.

✓ Production

production tested

July 04, 2026

last updated

1,727

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

An AI Agent uses an LLM to decide which tools to call and in what order — unlike a chain, it reasons about actions
LangChain provides the agent framework — createToolCallingAgent + RunnableWithMessageHistory runs the ReAct loop (reason, act, observe)
Supabase stores conversation history and vector embeddings for long-term memory
Next.js 16 Server Actions and Route Handlers stream responses via Vercel AI SDK (streamText)
Token costs scale with conversation length — truncate or summarize history to control spend
Biggest mistake: no tool guardrails — agents can call tools infinitely without maxSteps or deduplication

✦ Definition~90s read

What is How to Build an AI Agent with Next.js, LangChain & Supabase?

An AI agent loop is the runtime cycle where an LLM repeatedly decides, acts, and observes until it reaches a final answer — but without guardrails, it becomes a money incinerator. Each iteration calls the model, processes tool outputs, and re-invokes the LLM, burning tokens on every turn.

★

An AI Agent is like a smart assistant with a toolbox.

In a LangChain agent hooked into a Next.js app, a single user query can trigger dozens of loops, each costing fractions of a cent that compound into real bills. The problem isn't the loop itself — it's unbounded loops, where the agent gets stuck rethinking, re-calling tools, or generating verbose intermediate reasoning without ever terminating.

This article walks you through building an agent that uses Supabase for persistent memory and vector search, Next.js for streaming responses via Server Actions, and explicit cost controls like token budgets, loop limits, and conversation summarization. You'll learn to instrument every step, cap spend per session, and detect infinite loops before they drain your API credits.

Plain-English First

An AI Agent is like a smart assistant with a toolbox. Instead of just answering questions, it decides which tool to use — searching the web, querying a database, running calculations — and uses the result to inform its next step. LangChain is the framework that manages this decision loop. Supabase stores the conversation so the agent remembers what happened earlier. Next.js serves the interface and streams the agent's responses back to the user in real time.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

Building an AI agent requires orchestrating four components: the LLM (reasoning engine), tools (external capabilities), memory (conversation history), and a runtime loop (agent executor). LangChain provides the agent framework with ReAct prompting — the model reasons about what to do, selects a tool, observes the result, and decides the next step.

Supabase serves two roles: PostgreSQL stores conversation history for session continuity, and pgvector stores embeddings for semantic search over past interactions. Next.js 16 provides the API layer — Server Actions and Route Handlers process requests, and the Vercel AI SDK streams token-by-token responses to the client.

The production challenges are cost control (token usage scales with history length), latency (multi-step agent loops add sequential delay), reliability (tools can fail, LLMs can hallucinate tool inputs), and observability (debugging why an agent chose a specific action). This guide covers the complete implementation with production patterns for each.

What an AI Agent Loop Actually Is — and Why It Burns Money

An AI agent loop is a recursive function where an LLM generates actions, those actions trigger tool calls or API responses, and the output feeds back into the LLM for the next decision. In a Next.js + LangChain + Supabase stack, this loop lives server-side (often in an API route or server action) and persists state via Supabase's pgvector for memory and tool results. The core mechanic: the LLM decides whether to continue or stop based on a system prompt and accumulated context — but without hard cost guards, it can spin indefinitely, each turn costing tokens and API latency.

In practice, the loop runs inside a while or for loop in LangChain's AgentExecutor. Each iteration: (1) LLM call with current conversation + tool results, (2) parse output for action or final answer, (3) if action, execute tool (e.g., Supabase query, external API), (4) append result to memory, (5) repeat. The critical property is that the LLM has no inherent stop condition — it relies on prompt instructions or a max-iteration parameter. Without explicit cost control, a single user request can trigger 20+ LLM calls, each costing $0.01–$0.03, burning $0.60+ per request.

Use this pattern when you need autonomous multi-step reasoning: data enrichment pipelines, customer support triage, or research assistants. It matters in production because naive loops are the #1 cause of runaway costs in LLM applications. A single buggy tool response (e.g., a Supabase query returning an unexpected null) can cause the agent to retry or hallucinate indefinitely, turning a $0.05 request into a $5.00 bill. Always pair the loop with a hard iteration cap (e.g., 10 turns) and a token budget per turn.

The Infinite Loop Is Not a Bug — It's a Feature of the Architecture

The LLM will happily keep calling tools forever if you let it. The loop doesn't 'break' — it just keeps spending. You must enforce a max-iteration limit and a total token budget.

Production Insight

A customer support agent with a Supabase tool that returns empty results for missing user IDs caused the agent to retry the same query 47 times in one session, burning $1.20 in LLM calls before the user canceled.

Symptom: a single user request generates 30+ LLM calls in server logs, all repeating the same tool call pattern.

Rule of thumb: always set maxIterations ≤ 10 and wrap each tool call in a try-catch that returns a clear 'no data' message to break the loop.

Key Takeaway

An AI agent loop is a recursive LLM-tool cycle that must be bounded by code, not by the model's will.

Without a hard iteration cap and token budget, a single request can cost more than a month of traditional API usage.

Always log iteration count and cumulative token cost per session — treat them as critical metrics, not debug info.

thecodeforge.io

Build Ai Agent Next Js Langchain Supabase

Architecture: Agent, Tools, Memory, and Runtime

An AI agent has four components that work together in a loop. The LLM is the reasoning engine — it reads the conversation history and tool descriptions, decides which tool to call, and interprets the result. Tools are external functions the agent can invoke execution, API calls. Memory stores conversation history so the agent maintains context across turns. The runtime (AgentExecutor) orchestrates the loop: send context to the LLM, parse the tool call, execute the tool, append the result, repeat.

The key difference between an agent and a chain: a chain follows a fixed sequence of steps (retrieve -> format -> generate). An agent decides the sequence dynamically — it might search, then calculate, then search again, then answer. This flexibility comes at a cost: each step is a separate LLM call, adding latency and token usage.

LangChain's ReAct agent type implements this pattern. The prompt instructs the LLM to Think (reason about what to do), Act (call a tool with specific inputs), and Observe (read the tool result). The loop continues until the agent decides it has enough information to answer, or maxSteps is reached.

io.thecodeforge.ai-agent.architecture.tsTYPESCRIPT

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

// ============================================
// AI Agent Architecture — Core Components (2026 LangChain 0.3+)
// ============================================

// ---- Component 1: LLM (Reasoning Engine) ----
// The LLM decides which tools to call and interprets results

import { ChatOpenAI } from '@langchain/openai'

const llm = new ChatOpenAI({
  modelName: 'gpt-4o',
  temperature: 0, // Deterministic for tool-calling — reduces hallucination
  openAIApiKey: process.env.OPENAI_API_KEY,
})

// ---- Component 2: Tools (External Capabilities) ----
// Each tool has a name, description, schema, and execution function

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

const searchKnowledgeBase = new DynamicStructuredTool({
  name: 'search_knowledge_base',
  description: 'Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.',
  schema: z.object({
    query: z.string().describe('The search query — use keywords from the user question'),
    category: z.enum(['product', 'pricing', 'troubleshooting', 'general']).optional()
      .describe('Filter by category if the topic is clear'),
  }),
  func: async ({ query, category }) => {
    // Search Supabase vector store
    const results = await searchDocuments(query, category)
    if (results.length === 0) {
      return 'No results found. Try a different search query or answer from general knowledge.'
    }
    return results.map((r) => `[${r.title}]: ${r.content}`).join('\n\n')
  },
})

const getOrderStatus = new DynamicStructuredTool({
  name: 'get_order_status',
  description: 'Look up the status of a customer order by order ID. Returns shipping status, estimated delivery, and tracking number.',
  schema: z.object({
    orderId: z.string().describe('The order ID — format: ORD-XXXXX'),
  }),
  func: async ({ orderId }) => {
    const order = await getOrderFromDatabase(orderId)
    if (!order) {
      return `Order ${orderId} not found. Ask the user to verify the order ID.`
    }
    return JSON.stringify({
      status: order.status,
      estimatedDelivery: order.estimatedDelivery,
      trackingNumber: order.trackingNumber,
    })
  },
})

const calculateRefund = new DynamicStructuredTool({
  name: 'calculate_refund',
  description: 'Calculate the refund amount for a return. Considers order total, return reason, and days since purchase.',
  schema: z.object({
    orderId: z.string().describe('The order ID'),
    returnReason: z.enum(['defective', 'wrong_item', 'changed_mind', 'not_as_described'])
      .describe('The reason for the return'),
  }),
  func: async ({ orderId, returnReason }) => {
    const order = await getOrderFromDatabase(orderId)
    if (!order) return 'Order not found.'

    const daysSincePurchase = Math.floor(
      (Date.now() - new Date(order.createdAt).getTime()) / (1000 * 60 * 60 * 24)
    )

    let refundPercentage = 1.0
    if (returnReason === 'changed_mind' && daysSincePurchase > 30) {
      refundPercentage = 0.0
    } else if (returnReason === 'changed_mind') {
      refundPercentage = 0.85
    }

    const refundAmount = order.total * refundPercentage
    return JSON.stringify({ refundAmount, refundPercentage, daysSincePurchase })
  },
})

// ---- Component 3: Memory (Conversation History) ----
// Stored in Supabase PostgreSQL for persistence across requests
// Custom history class to match your token-tracking schema
import { SupabaseChatMessageHistory } from '@langchain/community/stores/message/supabase'

function createMemory(conversationId: string) {
  return new SupabaseChatMessageHistory({
    supabaseClient: supabase,
    tableName: 'messages',
    sessionId: conversationId,
  })
}

// ---- Component 4: Agent (2026 pattern) ----
// Use createToolCallingAgent + RunnableWithMessageHistory
import { createToolCallingAgent } from 'langchain/agents'
import { RunnableWithMessageHistory } from '@langchain/core/runnables'
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts'

const prompt = ChatPromptTemplate.fromMessages([
  ['system', `You are a helpful customer support agent for Acme Corp.

You have access to the following tools:
- search_knowledge_base: Search the internal knowledge base
- get_order_status: Look up order status by order ID
- calculate_refund: Calculate refund amounts for returns

Rules:
1. Always search the knowledge base before answering product questions
2. If a tool returns no new information, answer from what you know
3. Never make up order statuses — always use get_order_status
4. If you cannot answer, say so and offer to connect with a human agent
5. Maximum 10 tool calls per response — synthesize your answer after that`],
  new MessagesPlaceholder('chat_history'),
  ['human', '{input}'],
  new MessagesPlaceholder('agent_scratchpad'),
])

async function createAgent(conversationId: string) {
  const tools = [searchKnowledgeBase, getOrderStatus, calculateRefund]

  const agent = await createToolCallingAgent({
    llm,
    tools,
    prompt,
  })

  const agentWithHistory = new RunnableWithMessageHistory({
    runnable: agent,
    getMessageHistory: () => createMemory(conversationId),
    inputMessagesKey: 'input',
    historyMessagesKey: 'chat_history',
  })

  return agentWithHistory
}

Try it live

The Agent Loop: Think, Act, Observe

Think: the LLM reads the conversation history and tool descriptions, decides which tool to call
Act: the agent calls the tool with specific inputs parsed from the LLM output
Observe: the tool result is appended to the conversation, and the LLM decides the next step
Each step is a separate LLM call — latency and cost scale with the number of iterations
maxSteps stops the loop — without it, the agent can run indefinitely and drain API credits

Production Insight

Each agent step is a separate LLM call — 10 iterations means 10 API calls per user message.

maxSteps is the safety net — without it, agents loop infinitely and drain API credits.

Rule: set maxSteps to 10 for production, add cost monitoring per conversation.

Key Takeaway

An agent decides tool sequences dynamically — unlike chains which follow fixed steps.

createToolCallingAgent + RunnableWithMessageHistory orchestrates the loop — maxSteps prevents infinite loops and cost overruns.

Each iteration is a separate LLM call — cost and latency scale with step count.

Supabase: Memory, Vector Search, and Conversation Storage

Supabase serves three roles in the agent architecture. PostgreSQL stores conversation history — every message (user and assistant) is persisted for session continuity. pgvector stores document embeddings for semantic search — the agent can retrieve relevant knowledge base entries. The Supabase client provides real-time subscriptions if you want to show agent activity to the user in real time.

The conversation storage pattern: each conversation has a unique ID. Messages are appended to a messages table with the conversation_id as a foreign key. When the agent processes a new message, it loads the conversation history from Supabase and prepends it to the LLM context. This gives the agent memory across turns without storing state in the application server.

Vector search uses pgvector's cosine similarity. Documents are chunked, embedded with OpenAI's text-embedding-3-small model, and stored with their embedding vectors. The match_documents RPC function performs nearest-neighbor search and returns the top-k results above a similarity threshold (use 0.78–0.82 for text-embedding-3-small in production).

io.thecodeforge.ai-agent.supabase-schema.sqlSQL

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

-- ============================================
-- Supabase Schema for AI Agent
-- ============================================

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- ---- Conversations Table ----
-- Each conversation has a unique ID and tracks the user
CREATE TABLE public.conversations (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  title TEXT,
  total_tokens INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.conversations ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can manage their own conversations"
  ON public.conversations
  FOR ALL
  USING (auth.uid() = user_id)
  WITH CHECK (auth.uid() = user_id);

-- ---- Messages Table ----
-- Stores each message in a conversation
CREATE TABLE public.messages (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system', 'tool')),
  content TEXT NOT NULL,
  tool_calls JSONB, -- Stores tool call details for assistant messages
  tool_call_id TEXT, -- Links tool response to the original call
  token_count INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can manage messages in their conversations"
  ON public.messages
  FOR ALL
  USING (
    conversation_id IN (
      SELECT id FROM public.conversations WHERE user_id = auth.uid()
    )
  )
  WITH CHECK (
    conversation_id IN (
      SELECT id FROM public.conversations WHERE user_id = auth.uid()
    )
  );

-- Index for fast conversation history retrieval
CREATE INDEX idx_messages_conversation_created
  ON public.messages (conversation_id, created_at ASC);

-- ---- Documents Table ----
-- Knowledge base with vector embeddings
CREATE TABLE public.documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  category TEXT, -- product, pricing, troubleshooting, general
  embedding vector(1536), -- text-embedding-3-small produces 1536-dim vectors
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Documents are readable by authenticated users"
  ON public.documents
  FOR SELECT
  USING (auth.role() = 'authenticated');

-- Index for vector similarity search
CREATE INDEX ON public.documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- ---- Vector Search RPC Function ----
-- Performs nearest-neighbor search using cosine similarity
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_count INTEGER DEFAULT 5,
  match_threshold FLOAT DEFAULT 0.8, -- Production value for text-embedding-3-small
  filter_category TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  content TEXT,
  category TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.title,
    documents.content,
    documents.category,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM public.documents
  WHERE
    (filter_category IS NULL OR documents.category = filter_category)
    AND 1 - (documents.embedding <=> query_embedding) > match_threshold
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

-- ---- Conversation Summaries Table ----
-- Stores summarized history for long conversations
CREATE TABLE public.conversation_summaries (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE,
  summary TEXT NOT NULL,
  message_count INTEGER NOT NULL, -- Number of messages summarized
  token_count INTEGER NOT NULL, -- Token count of the original messages
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- ---- Updated At Trigger ----
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$;

CREATE TRIGGER conversations_updated_at
  BEFORE UPDATE ON public.conversations
  FOR EACH ROW
  EXECUTE FUNCTION update_updated_at();

Supabase as the Agent's Memory Layer

PostgreSQL stores conversation history — every message persisted for session continuity
pgvector stores document embeddings — agent retrieves relevant knowledge via semantic search
Conversation summaries compress long histories — control token costs for extended sessions
RLS policies ensure users only access their own conversations — security at the database level
The match_documents RPC function performs nearest-neighbor search with configurable threshold (0.78-0.82 recommended)

Production Insight

pgvector cosine similarity search retrieves relevant documents — threshold filters low-quality matches (use 0.78-0.82 for text-embedding-3-small).

Conversation history grows linearly with turns — summarize after 8,000 tokens to control costs.

Rule: store every message, track token counts per conversation, summarize when history is too long.

Key Takeaway

Supabase serves three roles: conversation storage, vector search, and session persistence.

pgvector match_documents RPC performs semantic search — cosine similarity with threshold filtering.

Summarize long conversations to control token costs — store summaries in a separate table.

thecodeforge.io

Build Ai Agent Next Js Langchain Supabase

Next.js Integration: Streaming API and Server Actions

The Next.js integration has two concerns: the API layer that processes agent requests, and the streaming layer that delivers token-by-token responses to the client. The Vercel AI SDK (ai package) handles both — it provides useChat for the client and streamText/StreamingTextResponse for the server.

The API route receives the user's message, loads conversation history from Supabase, runs the agent, and streams the response. The client uses useChat to manage the message list and display streaming tokens. The key pattern: the Route Handler returns streamText(...).toDataStreamResponse(), which is a streaming response that the client consumes incrementally.

For non-streaming use cases (background processing, webhook-triggered agents), use Server Actions instead. They run the agent synchronously and return the result. Server Actions are simpler but do not support streaming — the user waits for the full response.

io.thecodeforge.ai-agent.client-component.tsxTSX

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

// ============================================
// Client Component — Chat Interface with Streaming
// ============================================
// File: components/chat-interface.tsx

'use client'

import { useChat } from 'ai/react'
import { useState } from 'react'
import { Button } from '@/components/ui/button'
import { Input } from '@/components/ui/input'

export function ChatInterface({ conversationId }: { conversationId?: string }) {
  const [activeConversationId, setActiveConversationId] = useState(conversationId)

  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
    body: { conversationId: activeConversationId },
    onFinish: (message) => {
      // Extract conversation ID from response headers
      // (set by the server in StreamingTextResponse headers)
    },
    onError: (error) => {
      console.error('Chat error:', error)
    },
  })

  return (
    <div className="flex flex-col h-[600px] max-w-2xl mx-auto">
      {/* Messages */}
      <div className="flex-1 overflow-y-auto space-y-4 p-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div
              className={`max-w-[80%] rounded-lg px-4 py-2 ${
                message.role === 'user'
                  ? 'bg-primary text-primary-foreground'
                  : 'bg-muted'
              }`}
            >
              <p className="text-sm whitespace-pre-wrap">{message.content}</p>
            </div>
          </div>
        ))}

        {isLoading && (
          <div className="flex justify-start">
            <div className="bg-muted rounded-lg px-4 py-2">
              <div className="flex space-x-1">
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" />
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-100" />
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-200" />
              </div>
            </div>
          </div>
        )}
      </div>

      {/* Input */}
      <form onSubmit={handleSubmit} className="flex gap-2 p-4 border-t">
        <Input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask a question..."
          disabled={isLoading}
          className="flex-1"
        />
        <Button type="submit" disabled={isLoading}>
          Send
        </Button>
      </form>
    </div>
  )
}

// ============================================
// Server Action — Non-Streaming Agent (Background Processing)
// ============================================
// File: app/actions/agent.ts

'use server'

import { createAgent } from '@/lib/agent/factory'
import { createClient } from '@/lib/supabase/server'

export async function processAgentMessage(
  conversationId: string,
  message: string
) {
  const supabase = await createClient()
  const { data: { user } } = await supabase.auth.getUser()

  if (!user) {
    throw new Error('Unauthorized')
  }

  // Store user message
  await supabase.from('messages').insert({
    conversation_id: conversationId,
    role: 'user',
    content: message,
  })

  // Run agent (non-streaming)
  const agent = await createAgent(conversationId)
  const result = await agent.invoke({ input: message })

  // Store assistant response
  await supabase.from('messages').insert({
    conversation_id: conversationId,
    role: 'assistant',
    content: result.output,
  })

  return {
    answer: result.output,
    steps: result.intermediateSteps?.length ?? 0,
  }
}

Try it live

Streaming vs Server Actions: When to Use Each

Streaming (Route Handler + streamText): real-time user-facing chat — user sees tokens as they generate
Server Actions: background processing, webhook-triggered agents, batch operations — no streaming
Route Handlers must return streamText().toDataStreamResponse() — not a plain Response or JSON
useChat manages the message list and streaming state — do not manually manage messages
Server Actions cannot stream — the user waits for the full response before seeing anything

Production Insight

Route Handlers with streamText().toDataStreamResponse() enable real-time token streaming to the client.

Server Actions are simpler but cannot stream — use them for background processing only.

Rule: use Route Handlers for user-facing chat, Server Actions for background agent tasks.

Key Takeaway

Streaming requires Route Handler + streamText() + useChat on the client.

Server Actions run agents synchronously — no streaming, simpler but slower perceived performance.

Store every message in Supabase — conversation history enables session continuity.

Tools: Building and Validating Agent Capabilities

Tools are the agent's external capabilities — each tool is a function with a name, description, and input schema. The LLM reads the tool descriptions to decide which tool to call, and parses the input schema to generate valid arguments. The quality of the description directly affects the agent's ability to use the tool correctly.

The critical pattern: validate tool inputs with Zod before execution. LLMs can hallucinate invalid inputs — missing required fields, wrong types, or malformed queries. Zod validation catches these before the tool executes, returning a clear error message that the agent can learn from.

Tool design principles: descriptions should be specific (not just "search the database"), input schemas should use .describe() on every field (the LLM reads these descriptions), and error messages should be actionable (tell the agent what went wrong and how to fix it).

io.thecodeforge.ai-agent.tools.tsTYPESCRIPT

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

// ============================================
// Tool Design Patterns
// ============================================

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

// ---- Good Tool: Clear description, validated inputs, actionable errors + deduplication ----

export const searchKnowledgeBase = new DynamicStructuredTool({
  name: 'search_knowledge_base',
  description: `Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.
Do NOT use this tool for:
- Order status queries (use get_order_status instead)
- Calculations (use calculate tool instead)
- Questions about the current date or time`,
  schema: z.object({
    query: z.string()
      .min(3, 'Query must be at least 3 characters')
      .max(200, 'Query must be under 200 characters')
      .describe('The search query — use specific keywords from the user question. Avoid full sentences.'),
    category: z.enum(['product', 'pricing', 'troubleshooting', 'general'])
      .optional()
      .describe('Filter by category if the topic is clear from the question'),
    maxResults: z.number()
      .min(1)
      .max(10)
      .default(5)
      .describe('Maximum number of results to return — use 3 for focused answers, 5 for comprehensive'),
  }),
  func: async ({ query, category, maxResults }) => {
    try {
      // Deduplication guardrail
      const resultHash = JSON.stringify({ query, category }).slice(0, 100)
      if (globalThis.seenResults?.has(resultHash)) {
        return "You've already seen this result. Answer from context or ask for clarification."
      }
      globalThis.seenResults = globalThis.seenResults || new Set()
      globalThis.seenResults.add(resultHash)

      // Generate embedding for the query
      const embedding = await generateEmbedding(query)

      // Search Supabase vector store
      const { data, error } = await supabase.rpc('match_documents', {
        query_embedding: embedding,
        match_count: maxResults,
        match_threshold: 0.8, // Production value
        filter_category: category ?? null,
      })

      if (error) {
        return `Search failed: ${error.message}. Try a simpler query or remove the category filter.`
      }

      if (!data || data.length === 0) {
        return `No results found for "${query}". Try:
1. Using different keywords
2. Removing the category filter
3. Broadening the search terms`
      }

      return data.map((doc: any, i: number) =>
        `[Result ${i + 1}] (similarity: ${doc.similarity.toFixed(2)})
Title: ${doc.title}
Content: ${doc.content.slice(0, 500)}...`
      ).join('\n\n')
    } catch (err) {
      return `Search error: ${err instanceof Error ? err.message : 'Unknown error'}. The user should try again or contact support.`
    }
  },
})

// ---- Good Tool: Input validation with meaningful error messages ----

export const createTicket = new DynamicStructuredTool({
  name: 'create_support_ticket',
  description: 'Create a support ticket when the agent cannot resolve the issue. Use this as a last resort after exhausting available tools.',
  schema: z.object({
    subject: z.string()
      .min(10, 'Subject must be at least 10 characters — provide a clear summary')
      .max(200)
      .describe('Brief subject line summarizing the issue'),
    description: z.string()
      .min(50, 'Description must be at least 50 characters — include what was tried and what failed')
      .max(2000)
      .describe('Detailed description including what the user tried and what went wrong'),
    priority: z.enum(['low', 'medium', 'high', 'urgent'])
      .describe('Priority based on impact: low = question, medium = minor issue, high = blocked, urgent = production down'),
    category: z.enum(['billing', 'technical', 'account', 'feature_request'])
      .describe('Category of the issue'),
  }),
  func: async ({ subject, description, priority, category }) => {
    // Validate business rules
    if (priority === 'urgent' && category !== 'technical') {
      return 'Urgent priority is only available for technical issues. Please use high priority instead.'
    }

    const ticket = await createTicketInDatabase({ subject, description, priority, category })

    return `Ticket created successfully.
Ticket ID: ${ticket.id}
Priority: ${priority}
Expected response time: ${getResponseTime(priority)}

Tell the user: "I've created a support ticket (ID: ${ticket.id}) for you. A team member will respond within ${getResponseTime(priority)}."`
  },
})

// ---- Helper: Generate embedding ----
async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  })
  return response.data[0].embedding
}

function getResponseTime(priority: string): string {
  const times: Record<string, string> = {
    low: '48 hours',
    medium: '24 hours',
    high: '4 hours',
    urgent: '1 hour',
  }
  return times[priority] ?? '24 hours'
}

Try it live

Tool Design Best Practices

Descriptions must explain WHEN to use the tool — not just what it does
Every input field needs .describe() — the LLM reads these to generate valid arguments
Validate inputs with Zod before execution — LLMs can hallucinate invalid inputs
Error messages must be actionable — tell the agent what went wrong and how to retry
Return formatted text, not raw JSON — LLMs parse natural language better than JSON

Production Insight

Tool descriptions are the LLM's instructions — vague descriptions cause wrong tool selection.

Zod validation catches hallucinated inputs before execution — return clear error messages.

Rule: describe when to use the tool, validate every input, return actionable error messages.

Key Takeaway

Tool quality determines agent quality — descriptions guide the LLM's tool selection.

Validate inputs with Zod — LLMs hallucinate invalid arguments, especially in multi-step workflows.

Error messages must guide the agent — tell it what failed and how to retry with different inputs.

Cost Control: Token Management and Conversation Summarization

Token costs are the primary production concern for AI agents. Each agent iteration is a separate LLM call that includes the full conversation history, tool descriptions, and the agent's reasoning. A 20-turn conversation with 10 tool calls per turn can consume 100,000+ tokens — costing $1-10 depending on the model.

Three strategies control costs: history truncation (keep only the last N messages), conversation summarization (replace old messages with a summary), and model selection (use gpt-4o-mini for simple tasks, gpt-4o for complex reasoning). Summarization is the most effective — it preserves context while reducing token count by 80-90%.

io.thecodeforge.ai-agent.cost-control.tsTYPESCRIPT

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

// ============================================
// Cost Control: Token Management
// ============================================

import { ChatOpenAI } from '@langchain/openai'
import { createClient } from '@/lib/supabase/server'

// ---- Strategy 1: Conversation Summarization ----
// Summarize old messages when the conversation gets too long
// Reduces token count by 80-90% while preserving context

export async function getConversationHistory(conversationId: string, maxTokens: number = 8000) {
  const supabase = await createClient()

  // Get the latest summary (if any)
  const { data: summary } = await supabase
    .from('conversation_summaries')
    .select('summary, message_count')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: false })
    .limit(1)
    .single()

  // Get messages after the summary
  const offset = summary?.message_count ?? 0
  const { data: recentMessages } = await supabase
    .from('messages')
    .select('role, content, token_count')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: true })
    .range(offset, offset + 49)

  if (!recentMessages) return []

  // Build the history
  const history: { role: string; content: string }[] = []

  // Add summary as a system message if it exists
  if (summary) {
    history.push({
      role: 'system',
      content: `[Previous conversation summary]: ${summary.summary}`,
    })
  }

  // Add recent messages
  let totalTokens = summary ? estimateTokens(summary.summary) : 0
  for (const msg of recentMessages) {
    if (totalTokens + (msg.token_count ?? 0) > maxTokens) {
      break // Stop adding messages when we hit the token limit
    }
    history.push({ role: msg.role, content: msg.content })
    totalTokens += msg.token_count ?? 0
  }

  return history
}

// ---- Strategy 2: Model Selection Based on Task Complexity ----
// Use cheaper models for simple tasks, expensive models for complex reasoning

export function selectModel(taskComplexity: 'simple' | 'moderate' | 'complex') {
  const models = {
    simple: new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
      maxTokens: 500, // Limit output tokens for simple responses
    }),
    moderate: new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
      maxTokens: 1000,
    }),
    complex: new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0,
      maxTokens: 2000,
    }),
  }
  return models[taskComplexity]
}

// ---- Strategy 3: Token Budget Per Conversation ----
// Track and enforce token limits per conversation

export async function checkTokenBudget(conversationId: string): Promise<boolean> {
  const supabase = await createClient()
  const MAX_TOKENS_PER_CONVERSATION = 50000

  const { data: conversation } = await supabase
    .from('conversations')
    .select('total_tokens')
    .eq('id', conversationId)
    .single()

  if (!conversation) return true

  if (conversation.total_tokens >= MAX_TOKENS_PER_CONVERSATION) {
    // Summarize and continue
    await summarizeConversation(conversationId)
    return true
  }

  return true
}

// ---- Cost Estimation ----
// Estimate cost before running the agent

const MODEL_COSTS: Record<string, { input: number; output: number }> = {
  'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 },
  'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 },
  'text-embedding-3-small': { input: 0.02 / 1_000_000, output: 0 },
}

export function estimateCost(
  modelName: string,
  inputTokens: number,
  outputTokens: number
): number {
  const costs = MODEL_COSTS[modelName] ?? MODEL_COSTS['gpt-4o']
  return (inputTokens * costs.input) + (outputTokens * costs.output)
}

// ---- Usage Tracking ----
// Store token usage per conversation for billing and monitoring

export async function trackUsage(
  conversationId: string,
  modelName: string,
  inputTokens: number,
  outputTokens: number
) {
  const supabase = await createClient()
  const cost = estimateCost(modelName, inputTokens, outputTokens)

  await supabase.from('usage_logs').insert({
    conversation_id: conversationId,
    model: modelName,
    input_tokens: inputTokens,
    output_tokens: outputTokens,
    cost,
  })

  // Update conversation total
  await supabase
    .from('conversations')
    .update({
      total_tokens: supabase.rpc('increment_conversation_tokens', {
        conv_id: conversationId,
        tokens: inputTokens + outputTokens,
      }),
    })
    .eq('id', conversationId)
}

Try it live

Token Costs Scale with Conversation Length

Each agent iteration sends the full conversation history to the LLM — costs compound
A 20-turn conversation with 10 tool calls = 200 LLM calls = 100,000+ tokens
Summarization reduces token count by 80-90% while preserving context
Use gpt-4o-mini for simple tasks ($0.15/M input tokens) vs gpt-4o ($2.50/M input tokens)
Track per-conversation token usage in Supabase — set hard limits and alert on budget exceedance

Production Insight

Each agent iteration includes the full conversation history — costs compound with every turn.

Summarization reduces token count by 80-90% while preserving key context.

Rule: summarize after 8,000 tokens, use gpt-4o-mini for simple tasks, track per-conversation costs.

Key Takeaway

Token costs are the primary production concern — each iteration sends full history to the LLM.

Summarization, model selection, and token budgets control spend — implement all three.

Track per-conversation usage in Supabase — set hard limits and alert on exceedance.

Deployment: Environment, Monitoring, and Failure Handling

Production deployment requires three additions beyond the development setup: environment variable management (API keys, Supabase credentials), monitoring (token usage, error rates, latency), and failure handling (tool errors, LLM timeouts, rate limits).

The deployment target matters: Vercel has a 10-second timeout for Serverless Functions (300 seconds on Pro plan). Agent loops with multiple tool calls can exceed this. For long-running agents, use Vercel's Edge Runtime (no timeout limit) or deploy the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service).

io.thecodeforge.ai-agent.deployment.tsTYPESCRIPT

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

// ============================================
// Deployment Configuration
// ============================================

// ---- Environment Variables (.env.local) ----
// NEVER commit API keys to the repository

// OPENAI_API_KEY=sk-...
// SUPABASE_URL=https://your-project.supabase.co
// SUPABASE_ANON_KEY=eyJ...
// SUPABASE_SERVICE_ROLE_KEY=eyJ... (for server-side only)
// NODE_ENV=production

// ---- next.config.ts ----
// Configure for AI workloads

import type { NextConfig } from 'next'

const nextConfig: NextConfig = {
  // Increase body size limit for conversation history
  experimental: {
    serverActions: {
      bodySizeLimit: '2mb',
    },
  },
  // Configure for Vercel deployment
  serverExternalPackages: ['@langchain/openai', '@langchain/core'],
}

export default nextConfig

// ---- Rate Limiting (Upstash Redis — works in serverless) ----
// In-memory Map dies in serverless — use Redis

import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN! })

const ratelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
})

export async function rateLimitMiddleware(request: NextRequest) {
  if (request.nextUrl.pathname.startsWith('/api/chat')) {
    const ip = request.headers.get('x-forwarded-for') ?? 'unknown'
    const { success } = await ratelimit.limit(ip)
    if (!success) {
      return NextResponse.json(
        { error: 'Rate limit exceeded. Please wait before sending another message.' },
        { status: 429 }
      )
    }
  }
  return NextResponse.next()
}

// ---- Middleware: Rate Limiting ----
// Prevent abuse by limiting requests per user
export const config = {
  matcher: ['/api/:path*'],
}

// ---- Error Handling Wrapper ----
// Catch and log agent errors for observability
export async function runAgentWithErrorHandling(
  conversationId: string,
  input: string
) {
  const startTime = Date.now()

  try {
    const agent = await createAgent(conversationId)
    const result = await agent.invoke({ input })

    // Log success
    await logAgentRun({
      conversationId,
      status: 'success',
      duration: Date.now() - startTime,
      steps: result.intermediateSteps?.length ?? 0,
      input,
      output: result.output,
    })

    return result
  } catch (error) {
    // Log failure
    await logAgentRun({
      conversationId,
      status: 'error',
      duration: Date.now() - startTime,
      error: error instanceof Error ? error.message : 'Unknown error',
      input,
    })

    // Return user-friendly error
    if (error instanceof Error && error.message.includes('timeout')) {
      return {
        output: 'The request took too long to process. Please try a simpler question or try again later.',
      }
    }

    if (error instanceof Error && error.message.includes('rate_limit')) {
      return {
        output: 'The AI service is currently experiencing high demand. Please wait a moment and try again.',
      }
    }

    return {
      output: 'I encountered an error processing your request. A support ticket has been created automatically.',
    }
  }
}

async function logAgentRun(data: Record<string, unknown>) {
  const supabase = await createClient()
  await supabase.from('agent_runs').insert(data)
}

// ---- Health Check Endpoint ----
// File: app/api/health/route.ts
export async function GET() {
  const checks = {
    openai: false,
    supabase: false,
  }

  // Check OpenAI
  try {
    const response = await fetch('https://api.openai.com/v1/models', {
      headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
    })
    checks.openai = response.ok
  } catch {
    checks.openai = false
  }

  // Check Supabase
  try {
    const supabase = await createClient()
    const { error } = await supabase.from('conversations').select('id').limit(1)
    checks.supabase = !error
  } catch {
    checks.supabase = false
  }

  const healthy = Object.values(checks).every(Boolean)

  return Response.json(
    { status: healthy ? 'healthy' : 'degraded', checks },
    { status: healthy ? 200 : 503 }
  )
}

Try it live

Deployment Considerations

Vercel Serverless Functions have a 10-second timeout (300s on Pro) — agent loops may exceed this
Use Edge Runtime for long-running agents — no timeout limit, but no Node.js APIs
Rate limiting prevents abuse — 10 messages per minute per user is a reasonable default
Health checks verify OpenAI and Supabase connectivity — return 503 if either is down
Log every agent run with duration, steps, and errors — observability is critical for debugging

Production Insight

Vercel Serverless Functions have a 10-second timeout — agent loops with many tools may exceed it.

Rate limiting prevents abuse — 10 messages per minute per user is a reasonable baseline.

Rule: log every agent run with duration and error details — observability is critical for production.

Key Takeaway

Deployment requires rate limiting, health checks, and error handling — not just the agent code.

Vercel timeout limits constrain agent loop length — consider Edge Runtime or separate services.

Log every agent run — duration, steps, errors, and token usage for observability.

What No One Tells You About Prerequisites — And Why Missing One Wastes a Day

You don't need a laundry list of packages. You need a runtime that won't silently fail on streaming. Node 18+ is mandatory. Anything less and LangChain's callback handlers will hang your agent mid-loop with zero error messages.

Your local database matters too. Supabase local development with supabase start is non-negotiable. Prod-like indexes, real-time subscriptions, and row-level security all behave differently in the cloud if you've been testing against a remote project from day one. Burn that hour upfront.

CopilotKit's @copilotkit/react-core expects a context provider wrapping your Next.js app. Forget this and your agent will appear deaf — no errors, just an unresponsive chat window. The docs bury this. Now you know.

Last trap: OpenAI keys with no budget cap. Your agent's first runaway loop costs you $12 before you catch it. Set a hard limit in the OpenAI dashboard. Your wallet will thank you.

PrerequisiteCheck.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

// Verify environment before you write a single agent
const REQUIRED_NODE_VERSION = 18;

const nodeMajor = parseInt(process.version.slice(1).split('.')[0], 10);
if (nodeMajor < REQUIRED_NODE_VERSION) {
  console.error(`FATAL: Node ${process.version} detected. Need >= ${REQUIRED_NODE_VERSION}.`);
  process.exit(1);
}

// Confirm Supabase local chain is alive
import { createClient } from '@supabase/supabase-js';

const localSupabase = createClient(
  'http://localhost:54321',
  process.env.SUPABASE_LOCAL_SERVICE_ROLE_KEY
);

const { data, error } = await localSupabase.from('conversations').select('count').single();
if (error) {
  console.error('Supabase local container not responding. Run `supabase start` first.');
  process.exit(1);
}

console.log(`Env validated. Node ${process.version}, Supabase live. Building agent.`);

Output

Env validated. Node 20.11.0, Supabase live. Building agent.

Try it live

Production Trap:

Never source your Supabase URL from a .env in CI. Use supabase link and supabase secrets instead. Hardcoded project refs produce silent 500s when your staging Supabase project restarts overnight.

Key Takeaway

Prerequisites are not a checklist — they are a gating process. Test every dependency before you write one line of agent logic.

The Dependency Install That Breaks — And How to Preempt It

npm install @copilotkit/react-core @copilotkit/runtime langchain @langchain/openai supabase looks innocent. But LangChain's peer dependency on zod will fight CopilotKit's version every time. You end up with two zod instances in your bundle, and your tool schemas silently pass anything — including malformed JSON.

Use npm ls zod after install. If you see more than one version, run npm dedupe and pray. Better yet: use a .npmrc with legacy-peer-deps=true for LangChain's ecosystem. It's ugly. It works.

CopilotKit's runtime package requires openai as a peer. LangChain uses its own wrapper. Import order matters. Pull in @langchain/openai before @copilotkit/runtime. If you reverse them, you get cryptic errors about missing ChatOpenAI constructors.

One more footgun: Supabase's @supabase/ssr package for Next.js App Router. Your server actions will drop session cookies unless you use the createServerClient factory exactly as their docs show. Copy-paste it. Don't reimplement.

DependencyCheck.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

// Run this after install to catch version fights
import { execSync } from 'child_process';

try {
  const zodVersions = execSync('npm ls zod --depth=0 --json', { encoding: 'utf8' });
  const parsed = JSON.parse(zodVersions);
  const deps = parsed.dependencies || {};
  const uniqueVersions = new Set(Object.values(deps).map(d => d.version));
  
  if (uniqueVersions.size > 1) {
    console.warn('WARN: Multiple zod versions detected. Run `npm dedupe`.');
    console.table(deps);
  } else {
    console.log('OK: Single zod version in tree.');
  }
} catch {
  console.log('zod not installed yet — clean slate.');
}

Output

WARN: Multiple zod versions detected. Run `npm dedupe`.

┌─────────┬─────────┐

│ (index) │ version │

├─────────┼─────────┤

│ langchain │ 3.22.4 │

│ copilotkit │ 3.21.0 │

└─────────┴─────────┘

Try it live

Senior Shortcut:

Lock LangChain to @langchain/core@0.1.50 — anything newer breaks CopilotKit's runtime. Hardcode this in your package.json resolutions field. You'll thank me when the next major ships.

Key Takeaway

Package resolution in the LangChain + CopilotKit + Supabase stack is a bug farm. Validate your tree, pin versions, and import in the right order.

Why Your Agent Can't See Supabase Data — And How to Fix It

You wired the database. The agent calls a tool named searchBlogs. The tool returns nothing. You check Supabase — rows are there. The problem isn't your SQL. It's the invisible wall between CopilotKit's frontend runtime and your Supabase client.

CopilotKit agents run on the server. When you pass supabase as a client to your tool function, you're passing a browser-side client. It can't see rows protected by Row Level Security unless you forward the authenticated session. The agent gets zero results. No error. Just silence.

Solution: create a server-side Supabase client inside your tool handler using the user's JWT from the request context. CopilotKit's CopilotRuntime exposes context.user with the token. Use createServerClient with that token. Every tool call then runs with the user's permissions.

Second trap: vector search queries return null metadata when your match_documents function doesn't select the right columns. Always explicitly select metadata -> 'title' in your Postgres function. The agent needs structured data, not a JSON blob.

SupabaseAuthTool.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

// Server-side tool with authenticated Supabase access
import { createServerClient } from '@supabase/ssr';
import { tool } from '@langchain/core/tools';
import { z } from 'zod';

export const searchUserBlogs = tool(
  async ({ query, userToken }) => {
    const supabase = createServerClient(
      process.env.SUPABASE_URL!,
      process.env.SUPABASE_SERVICE_ROLE_KEY!,
      {
        cookies: {
          get: () => null, // Not used here — we use token directly
        },
        auth: {
          autoRefreshToken: false,
          detectSessionInUrl: false,
        },
      }
    );

    // Impersonate the user with their JWT
    const { data, error } = await supabase.auth.setSession({
      access_token: userToken,
      refresh_token: '',
    });

    if (error || !data.user) {
      return JSON.stringify({ error: 'Session expired' });
    }

    const { data: blogs } = await supabase
      .from('blogs')
      .select('title, content, created_at')
      .textSearch('content', query)
      .limit(5);

    return JSON.stringify(blogs || []);
  },
  {
    name: 'searchUserBlogs',
    description: 'Search the authenticated user\'s blog posts',
    schema: z.object({
      query: z.string(),
      userToken: z.string(),
    }),
  }
);

Output

[{"title":"Building AI Agents","content":"...","created_at":"2024-03-15T10:00:00Z"}]

Try it live

Production Trap:

Never use the service_role key here. It bypasses RLS and exposes every user's data. Use the user's JWT + createServerClient to enforce row-level security. Your clients will thank you when the audit comes.

Key Takeaway

Your agent's Supabase queries fail because the auth context is wrong — not the SQL. Always pass the user JWT into tool handlers and create a fresh server client.

The Dependency Install That Breaks — And How to Preempt It

You think npm install langchain is safe? Wrong. The LangChain.js ecosystem is a minefield of breaking updates that nuke your agent loop before it runs. You pull a version that dropped Chain in favor of Runnable without warning — now your imports fail silently, your streaming endpoint crashes, and you waste a day tracing phantom bugs.

The fix: lock your versions before you even think about coding. Use npm install langchain@0.1.x — the stable 0.1 series. Pair it with @langchain/openai@0.0.x and @langchain/community@0.0.x. Forget that, and you'll hit Cannot find module '@langchain/core/dist/runnables' at 2 AM.

Second: test your install with a single agent creation before wiring into Next.js. Run node -e "const { OpenAI } = require('@langchain/openai'); console.log('ok')". If it errors, your package manager is hoisting wrong. Set OVERWRITE to true in your .npmrc to force flat node_modules. This kills the "why can't LangChain see OpenAI" problem dead.

verify-install.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial
// Verify LangChain + OpenAI install before Next.js

const { OpenAI } = require('@langchain/openai');
const { ChatOpenAI } = require('@langchain/openai');

const model = new ChatOpenAI({ temperature: 0 });

async function test() {
  const res = await model.invoke('Just say OK');
  console.log('Agent ready:', res.content);
}

test().catch(err => console.error('Install broken:', err.message));

Output

Agent ready: OK

Try it live

Production Trap:

Never use npm update on LangChain packages without reading changelogs. One ^0.2.0 bump and your streaming logic breaks because processResponse was renamed to parseStreamedChunks.

Key Takeaway

Lock LangChain to 0.1.x before a single import. Test install standalone before touching Next.js routes.

Why Your Agent Can't See Supabase Data — And How to Fix It

You built a tool that queries Supabase for user invoices. Your agent returns "I can't find that data" even though the Supabase table is full. Problem: you passed raw SQL to an AI agent that doesn't know your schema. The agent generates hallucinated column names like user_email when your table has email_address. That's not a bug — it's a broken tool design.

Fix: wrap Supabase queries in validated tools that enforce structure before the agent touches them. Use Zod schemas to sanitize inputs and return predictable JSON. For example, your getInvoice tool should accept only userId as a string, query the correct table with typed columns, and return { status: 'found' | 'missing', data: object } every time.

Second: give the agent a schema context. When you define the tool, include a description that lists exact column names: "Find invoices by userId. Table fields: id, email_address, amount, created_at." This stops the agent from inventing its own field names. Test the tool in isolation — call it directly, see the output, then wire it into the agent. If your tool works but the agent fails, your schema description is wrong, not your code.

safe-supabase-tool.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial
// Validated Supabase tool with Zod schema

import { z } from 'zod';
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_SERVICE_KEY);

const schema = z.object({
  userId: z.string().uuid()
});

export const getInvoiceTool = {
  name: 'get_invoice',
  description: 'Find invoice by userId. Fields: id, email_address, amount, created_at.',
  schema,
  async call({ userId }) {
    const { data, error } = await supabase
      .from('invoices')
      .select('id, email_address, amount, created_at')
      .eq('user_id', userId)
      .single();

    if (error || !data) return { status: 'missing', data: null };
    return { status: 'found', data };
  }
};

Output

{ status: 'found', data: { id: 42, email_address: 'dev@example.com', amount: 199, created_at: '2024-03-15' } }

Try it live

Senior Shortcut:

Always hardcode column names in the tool description. If the agent still hallucinates, prefix your description with 'CRITICAL: Do not change field names.' Works 90% of the time.

Key Takeaway

Your agent doesn't guess Supabase columns — it hallucinates them. Validate schema in the tool, spell out fields in the description.

Why Your LangChain Agent Fails in Production — And How Next.js App Router Fixes It

Most tutorials show agent loops in simple Node scripts. In production, you need request-scoped context, streaming to the UI, and error recovery without dropping user sessions. Next.js App Router provides this through Server Actions that maintain agent state across multiple function calls. The key insight is that your agent loop must be a controlled, resumable process — not a single long-running function. By combining Next.js Server Actions with LangChain's RunnableWithMessageHistory, you create a durable agent that survives page refreshes and network interruptions. Each action call picks up the conversation from Supabase, executes one turn of the agent loop, and streams the response back. This pattern eliminates the common timeout and memory leak issues that plague monolithic agent implementations.

agent-action.tsJAVASCRIPT

// io.thecodeforge — javascript tutorial
import { createServerActionClient } from '@supabase/auth-helpers-nextjs'
import { RunnableWithMessageHistory } from '@langchain/core/runnables'
import { cookies } from 'next/headers'

export async function agentAction(input: string) {
  const supabase = createServerActionClient({ cookies })
  const { data: { session } } = await supabase.auth.getSession()
  const agent = new RunnableWithMessageHistory({
    runnable: agentChain,
    getMessageHistory: (sessionId) => new SupabaseChatHistory(sessionId, supabase),
    inputMessagesKey: 'input',
    historyMessagesKey: 'chat_history'
  })
  const stream = await agent.stream({ input }, { configurable: { sessionId: session.user.id } })
  return stream
}

Output

Returns ReadableStream for Next.js streaming response

Try it live

Production Trap:

Never create a new agent instance per request. Use dependency injection to share the compiled chain across requests — but scope the message history to the user session.

Key Takeaway

Server Actions + RunnableWithMessageHistory = durable, resumable agents.

The Vector Search Index That Wastes Your Credits — And How LangChain + Supabase Fixes It

Every vector search call costs you token credits and latency. Naive implementations re-embed the user query on every turn, often searching irrelevant chunks that burn through your context window. The fix is a two-tier retrieval system using Supabase's native vector extension and LangChain's parent document retriever. First, embed your documents once with OpenAI embeddings and store both chunk-level and document-level vectors in Supabase. Then, at query time, retrieve only relevant document IDs first, then fetch their children chunks. This reduces token consumption by 70% and eliminates irrelevant context injection. Supabase supports this natively with SQL functions that filter by metadata before performing the cosine similarity search. Your agent only sees the top-k relevant chunks, not the entire document base.

parent-retriever.tsJAVASCRIPT

// io.thecodeforge — javascript tutorial
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase'
import { ParentDocumentRetriever } from 'langchain/retrievers/parent_document'
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'

export async function createRetriever(supabaseClient) {
  const store = new SupabaseVectorStore(new OpenAIEmbeddings(), {
    client: supabaseClient,
    tableName: 'documents',
    queryName: 'match_documents'
  })
  const retriever = new ParentDocumentRetriever({
    vectorstore: store,
    parentSplitter: new RecursiveCharacterTextSplitter({ chunkSize: 2000 }),
    childSplitter: new RecursiveCharacterTextSplitter({ chunkSize: 400 }),
    childK: 20,
    parentK: 5
  })
  return retriever
}

Output

Retriever that returns only 5 parent documents (max 10k tokens) instead of 50 raw chunks (20k+ tokens)

Try it live

Production Trap:

Default Supabase vector search returns raw chunks without deduplication. Add a GROUP BY document_id in your match_documents SQL function to avoid flooding the context window with overlapping content.

Key Takeaway

Parent document retriever cuts token costs 70% by fetching whole documents, not random chunks.

● Production incidentPOST-MORTEMseverity: high

Agent Infinite Loop Cost $4,200 in OpenAI Credits in 3 Hours

Symptom

The billing alert fired at 2 AM — OpenAI API spend had exceeded the daily budget by 84x. The agent logs showed 14,000 tool calls for a single conversation. The agent was stuck in a loop: search_knowledge_base -> result -> "I need more information" -> search_knowledge_base -> same result -> repeat.

Assumption

The AgentExecutor default maxIterations (15) would prevent infinite loops. The team assumed the agent would stop after 15 steps and return a partial answer.

Root cause

The team had set maxIterations to 0 (unlimited) during development to debug a complex multi-step workflow and never changed it back. The agent's prompt did not include a fallback instruction for when tools return no new information. The knowledge base search returned the same 3 results every time, but the agent kept rephrasing the query because the prompt said "search until you find the answer." Without a max iteration limit or a "stop if no new information" instruction, the loop continued indefinitely.

Fix

Set maxSteps to 10 for production. Added a tool result deduplication check — if the same search result is returned twice, the agent is instructed to synthesize an answer from available information instead of searching again. Added a hard cost ceiling via OpenAI usage limits ($100/day). Added per-conversation token tracking in Supabase — conversations exceeding 8,000 tokens trigger automatic summarization. Added an alert for any conversation exceeding 20 tool calls.

Key lesson

Always set maxSteps on the agent — unlimited loops will drain your API budget.
Include fallback instructions in the agent prompt: what to do when tools return no new information.
Set OpenAI usage limits as a safety net — they are the last line of defense against cost overruns.
Track per-conversation token usage — summarize or truncate history when it exceeds your budget threshold.

Production debug guideDiagnose agent loops, tool failures, and memory issues7 entries

Symptom · 01

Agent enters infinite tool-calling loop

→

Fix

Check maxSteps on the agent — set to 10 for production. Add deduplication for tool results.

Symptom · 02

Agent hallucinates tool input parameters

→

Fix

Add Zod schema validation to tool input — reject invalid inputs before the tool executes

Symptom · 03

Conversation history not persisting across requests

→

Fix

Verify Supabase client is writing to the conversations table on each message — check RLS policies

Symptom · 04

Streaming response stops mid-token

→

Fix

Check that the Route Handler returns streamText().toDataStreamResponse() and the client uses useChat from Vercel AI SDK

Symptom · 05

Agent ignores tools and answers directly

→

Fix

Check the agent prompt — ensure tool descriptions are clear and the prompt instructs the agent to use tools

Symptom · 06

Vector search returns irrelevant results

→

Fix

Check embedding model consistency — the same model must be used for indexing and querying. Verify the match_threshold parameter (0.78-0.82 for text-embedding-3-small).

Symptom · 07

Token costs higher than expected

→

Fix

Check conversation history length — long histories multiply token usage. Implement summarization for conversations over 8,000 tokens.

★ AI Agent Quick Debug ReferenceFast commands for diagnosing agent, tool, and memory issues

Agent not calling tools−

Immediate action

Check agent prompt and tool descriptions

Commands

grep -rn 'tool\|Tool\|description' lib/agent/ --include='*.ts' | head -20

cat lib/agent/tools.ts | head -60

Fix now

Verify tools are passed to createToolCallingAgent and each tool has a clear description string

Supabase vector search returning empty+

Streaming not working+

Token costs spiking+

Agent Frameworks Compared

Framework	Language	Agent Type	Streaming	Tool Ecosystem	Production Ready	Best For
LangChain	Python, JS/TS	ReAct, Tool Calling	Yes (with streamEvents)	Large	Yes*	Complex multi-tool agents, RAG pipelines
Vercel AI SDK	JS/TS	OpenAI Functions	Yes (native)	Small	Yes	Simple chat agents, streaming-first apps
CrewAI	Python	Role-based multi-agent	Limited	Medium	Growing	Multi-agent collaboration, research tasks
AutoGen	Python	Conversational multi-agent	Yes	Medium	Yes	Multi-agent conversations, code generation
Direct OpenAI API	Any	Function calling	Yes	None (manual)	Yes	Simple single-tool agents, full control

⚙ Quick Reference

13 commands from this guide

File	Command / Code	Purpose
io.thecodeforge.ai-agent.architecture.ts	const llm = new ChatOpenAI({	Architecture
io.thecodeforge.ai-agent.supabase-schema.sql	CREATE EXTENSION IF NOT EXISTS vector;	Supabase
io.thecodeforge.ai-agent.client-component.tsx	'use client'	Next.js Integration
io.thecodeforge.ai-agent.tools.ts	export const searchKnowledgeBase = new DynamicStructuredTool({	Tools
io.thecodeforge.ai-agent.cost-control.ts	export async function getConversationHistory(conversationId: string, maxTokens: ...	Cost Control
io.thecodeforge.ai-agent.deployment.ts	const nextConfig: NextConfig = {	Deployment
PrerequisiteCheck.js	const REQUIRED_NODE_VERSION = 18;	What No One Tells You About Prerequisites
DependencyCheck.js	try {	The Dependency Install That Breaks
SupabaseAuthTool.js	export const searchUserBlogs = tool(	Why Your Agent Can't See Supabase Data
verify-install.js	const { OpenAI } = require('@langchain/openai');	The Dependency Install That Breaks
safe-supabase-tool.js	const supabase = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_SER...	Why Your Agent Can't See Supabase Data
agent-action.ts	export async function agentAction(input: string) {	Why Your LangChain Agent Fails in Production
parent-retriever.ts	export async function createRetriever(supabaseClient) {	The Vector Search Index That Wastes Your Credits

Key takeaways

An agent decides tool sequences dynamically

unlike chains which follow fixed steps

maxSteps on the agent prevents infinite loops

always set it to 10 in production

Supabase stores conversation history and vector embeddings

the database is the agent's memory

Conversation summarization reduces token costs by 80-90% while preserving context

Tool descriptions guide the LLM's tool selection

write them like instructions, not documentation

Symptom

When the agent produces a wrong answer or enters a loop, there is no way to debug what happened. No visibility into which tools were called, what inputs were used, or how many steps were taken.

Fix

Log every agent run with: conversation ID, input, output, intermediate steps, duration, token count, and error details. Use these logs to identify patterns in agent failures.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

What is the difference between an AI agent and a chain in LangChain?

Q02SENIOR

How do you prevent an AI agent from entering an infinite loop?

Q03SENIOR

How do you manage conversation memory in a stateless serverless environm...

Q04SENIOR

How do you control token costs for an AI agent in production?

Q05JUNIOR

What is the role of Supabase in an AI agent architecture?

Q01 of 05SENIOR

What is the difference between an AI agent and a chain in LangChain?

ANSWER

A chain follows a fixed sequence of steps — retrieve context, format prompt, generate response. The sequence is defined at build time and does not change based on the input. An agent decides the sequence dynamically. It uses an LLM to reason about which tool to call, calls the tool, observes the result, and decides the next step. The sequence is determined at runtime based on the input and intermediate results. The trade-off: agents are more flexible (they can handle unexpected queries by choosing different tools) but more expensive (each step is a separate LLM call) and harder to debug (the sequence is non-deterministic). Chains are cheaper, faster, and predictable but limited to predefined workflows.

FAQ · 5 QUESTIONS

Frequently Asked Questions

Can I use a different LLM instead of OpenAI with LangChain?

How do I test an AI agent in my CI/CD pipeline?

How do I handle multi-agent architectures where agents delegate to each other?

What is the difference between gpt-4o and gpt-4o-mini for agent tasks?

How do I deploy an AI agent that exceeds Vercel's serverless timeout?

Naren Founder & Principal Engineer

20+ years shipping production JavaScript and front-end systems at scale. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 04, 2026

last updated

1,727

articles · all by Naren

🔥

That's React.js. Mark it forged?

8 min read · try the examples if you haven't