Junior 11 min · April 12, 2026

AI Agent Infinite Loops — LangChain + Next.js Cost Control

One missing maxIterations caused 14,000 tool calls and $4,200 in OpenAI spend.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • An AI Agent uses an LLM to decide which tools to call and in what order — unlike a chain, it reasons about actions
  • LangChain provides the agent framework — createToolCallingAgent + RunnableWithMessageHistory runs the ReAct loop (reason, act, observe)
  • Supabase stores conversation history and vector embeddings for long-term memory
  • Next.js 16 Server Actions and Route Handlers stream responses via Vercel AI SDK (streamText)
  • Token costs scale with conversation length — truncate or summarize history to control spend
  • Biggest mistake: no tool guardrails — agents can call tools infinitely without maxSteps or deduplication
✦ Definition~90s read
What is AI Agent Infinite Loops — LangChain + Next.js Cost Control?

An AI agent loop is the runtime cycle where an LLM repeatedly decides, acts, and observes until it reaches a final answer — but without guardrails, it becomes a money incinerator. Each iteration calls the model, processes tool outputs, and re-invokes the LLM, burning tokens on every turn.

An AI Agent is like a smart assistant with a toolbox.

In a LangChain agent hooked into a Next.js app, a single user query can trigger dozens of loops, each costing fractions of a cent that compound into real bills. The problem isn't the loop itself — it's unbounded loops, where the agent gets stuck rethinking, re-calling tools, or generating verbose intermediate reasoning without ever terminating.

This article walks you through building an agent that uses Supabase for persistent memory and vector search, Next.js for streaming responses via Server Actions, and explicit cost controls like token budgets, loop limits, and conversation summarization. You'll learn to instrument every step, cap spend per session, and detect infinite loops before they drain your API credits.

Plain-English First

An AI Agent is like a smart assistant with a toolbox. Instead of just answering questions, it decides which tool to use — searching the web, querying a database, running calculations — and uses the result to inform its next step. LangChain is the framework that manages this decision loop. Supabase stores the conversation so the agent remembers what happened earlier. Next.js serves the interface and streams the agent's responses back to the user in real time.

Building an AI agent requires orchestrating four components: the LLM (reasoning engine), tools (external capabilities), memory (conversation history), and a runtime loop (agent executor). LangChain provides the agent framework with ReAct prompting — the model reasons about what to do, selects a tool, observes the result, and decides the next step.

Supabase serves two roles: PostgreSQL stores conversation history for session continuity, and pgvector stores embeddings for semantic search over past interactions. Next.js 16 provides the API layer — Server Actions and Route Handlers process requests, and the Vercel AI SDK streams token-by-token responses to the client.

The production challenges are cost control (token usage scales with history length), latency (multi-step agent loops add sequential delay), reliability (tools can fail, LLMs can hallucinate tool inputs), and observability (debugging why an agent chose a specific action). This guide covers the complete implementation with production patterns for each.

What an AI Agent Loop Actually Is — and Why It Burns Money

An AI agent loop is a recursive function where an LLM generates actions, those actions trigger tool calls or API responses, and the output feeds back into the LLM for the next decision. In a Next.js + LangChain + Supabase stack, this loop lives server-side (often in an API route or server action) and persists state via Supabase's pgvector for memory and tool results. The core mechanic: the LLM decides whether to continue or stop based on a system prompt and accumulated context — but without hard cost guards, it can spin indefinitely, each turn costing tokens and API latency.

In practice, the loop runs inside a while or for loop in LangChain's AgentExecutor. Each iteration: (1) LLM call with current conversation + tool results, (2) parse output for action or final answer, (3) if action, execute tool (e.g., Supabase query, external API), (4) append result to memory, (5) repeat. The critical property is that the LLM has no inherent stop condition — it relies on prompt instructions or a max-iteration parameter. Without explicit cost control, a single user request can trigger 20+ LLM calls, each costing $0.01–$0.03, burning $0.60+ per request.

Use this pattern when you need autonomous multi-step reasoning: data enrichment pipelines, customer support triage, or research assistants. It matters in production because naive loops are the #1 cause of runaway costs in LLM applications. A single buggy tool response (e.g., a Supabase query returning an unexpected null) can cause the agent to retry or hallucinate indefinitely, turning a $0.05 request into a $5.00 bill. Always pair the loop with a hard iteration cap (e.g., 10 turns) and a token budget per turn.

The Infinite Loop Is Not a Bug — It's a Feature of the Architecture
The LLM will happily keep calling tools forever if you let it. The loop doesn't 'break' — it just keeps spending. You must enforce a max-iteration limit and a total token budget.
Production Insight
A customer support agent with a Supabase tool that returns empty results for missing user IDs caused the agent to retry the same query 47 times in one session, burning $1.20 in LLM calls before the user canceled.
Symptom: a single user request generates 30+ LLM calls in server logs, all repeating the same tool call pattern.
Rule of thumb: always set maxIterations ≤ 10 and wrap each tool call in a try-catch that returns a clear 'no data' message to break the loop.
Key Takeaway
An AI agent loop is a recursive LLM-tool cycle that must be bounded by code, not by the model's will.
Without a hard iteration cap and token budget, a single request can cost more than a month of traditional API usage.
Always log iteration count and cumulative token cost per session — treat them as critical metrics, not debug info.

Architecture: Agent, Tools, Memory, and Runtime

An AI agent has four components that work together in a loop. The LLM is the reasoning engine — it reads the conversation history and tool descriptions, decides which tool to call, and interprets the result. Tools are external functions the agent can invoke execution, API calls. Memory stores conversation history so the agent maintains context across turns. The runtime (AgentExecutor) orchestrates the loop: send context to the LLM, parse the tool call, execute the tool, append the result, repeat.

The key difference between an agent and a chain: a chain follows a fixed sequence of steps (retrieve -> format -> generate). An agent decides the sequence dynamically — it might search, then calculate, then search again, then answer. This flexibility comes at a cost: each step is a separate LLM call, adding latency and token usage.

LangChain's ReAct agent type implements this pattern. The prompt instructs the LLM to Think (reason about what to do), Act (call a tool with specific inputs), and Observe (read the tool result). The loop continues until the agent decides it has enough information to answer, or maxSteps is reached.

io.thecodeforge.ai-agent.architecture.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
// ============================================
// AI Agent Architecture — Core Components (2026 LangChain 0.3+)
// ============================================

// ---- Component 1: LLM (Reasoning Engine) ----
// The LLM decides which tools to call and interprets results

import { ChatOpenAI } from '@langchain/openai'

const llm = new ChatOpenAI({
  modelName: 'gpt-4o',
  temperature: 0, // Deterministic for tool-calling — reduces hallucination
  openAIApiKey: process.env.OPENAI_API_KEY,
})

// ---- Component 2: Tools (External Capabilities) ----
// Each tool has a name, description, schema, and execution function

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

const searchKnowledgeBase = new DynamicStructuredTool({
  name: 'search_knowledge_base',
  description: 'Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.',
  schema: z.object({
    query: z.string().describe('The search query — use keywords from the user question'),
    category: z.enum(['product', 'pricing', 'troubleshooting', 'general']).optional()
      .describe('Filter by category if the topic is clear'),
  }),
  func: async ({ query, category }) => {
    // Search Supabase vector store
    const results = await searchDocuments(query, category)
    if (results.length === 0) {
      return 'No results found. Try a different search query or answer from general knowledge.'
    }
    return results.map((r) => `[${r.title}]: ${r.content}`).join('\n\n')
  },
})

const getOrderStatus = new DynamicStructuredTool({
  name: 'get_order_status',
  description: 'Look up the status of a customer order by order ID. Returns shipping status, estimated delivery, and tracking number.',
  schema: z.object({
    orderId: z.string().describe('The order ID — format: ORD-XXXXX'),
  }),
  func: async ({ orderId }) => {
    const order = await getOrderFromDatabase(orderId)
    if (!order) {
      return `Order ${orderId} not found. Ask the user to verify the order ID.`
    }
    return JSON.stringify({
      status: order.status,
      estimatedDelivery: order.estimatedDelivery,
      trackingNumber: order.trackingNumber,
    })
  },
})

const calculateRefund = new DynamicStructuredTool({
  name: 'calculate_refund',
  description: 'Calculate the refund amount for a return. Considers order total, return reason, and days since purchase.',
  schema: z.object({
    orderId: z.string().describe('The order ID'),
    returnReason: z.enum(['defective', 'wrong_item', 'changed_mind', 'not_as_described'])
      .describe('The reason for the return'),
  }),
  func: async ({ orderId, returnReason }) => {
    const order = await getOrderFromDatabase(orderId)
    if (!order) return 'Order not found.'

    const daysSincePurchase = Math.floor(
      (Date.now() - new Date(order.createdAt).getTime()) / (1000 * 60 * 60 * 24)
    )

    let refundPercentage = 1.0
    if (returnReason === 'changed_mind' && daysSincePurchase > 30) {
      refundPercentage = 0.0
    } else if (returnReason === 'changed_mind') {
      refundPercentage = 0.85
    }

    const refundAmount = order.total * refundPercentage
    return JSON.stringify({ refundAmount, refundPercentage, daysSincePurchase })
  },
})

// ---- Component 3: Memory (Conversation History) ----
// Stored in Supabase PostgreSQL for persistence across requests
// Custom history class to match your token-tracking schema
import { SupabaseChatMessageHistory } from '@langchain/community/stores/message/supabase'

function createMemory(conversationId: string) {
  return new SupabaseChatMessageHistory({
    supabaseClient: supabase,
    tableName: 'messages',
    sessionId: conversationId,
  })
}

// ---- Component 4: Agent (2026 pattern) ----
// Use createToolCallingAgent + RunnableWithMessageHistory
import { createToolCallingAgent } from 'langchain/agents'
import { RunnableWithMessageHistory } from '@langchain/core/runnables'
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts'

const prompt = ChatPromptTemplate.fromMessages([
  ['system', `You are a helpful customer support agent for Acme Corp.

You have access to the following tools:
- search_knowledge_base: Search the internal knowledge base
- get_order_status: Look up order status by order ID
- calculate_refund: Calculate refund amounts for returns

Rules:
1. Always search the knowledge base before answering product questions
2. If a tool returns no new information, answer from what you know
3. Never make up order statuses — always use get_order_status
4. If you cannot answer, say so and offer to connect with a human agent
5. Maximum 10 tool calls per response — synthesize your answer after that`],
  new MessagesPlaceholder('chat_history'),
  ['human', '{input}'],
  new MessagesPlaceholder('agent_scratchpad'),
])

async function createAgent(conversationId: string) {
  const tools = [searchKnowledgeBase, getOrderStatus, calculateRefund]

  const agent = await createToolCallingAgent({
    llm,
    tools,
    prompt,
  })

  const agentWithHistory = new RunnableWithMessageHistory({
    runnable: agent,
    getMessageHistory: () => createMemory(conversationId),
    inputMessagesKey: 'input',
    historyMessagesKey: 'chat_history',
  })

  return agentWithHistory
}
The Agent Loop: Think, Act, Observe
  • Think: the LLM reads the conversation history and tool descriptions, decides which tool to call
  • Act: the agent calls the tool with specific inputs parsed from the LLM output
  • Observe: the tool result is appended to the conversation, and the LLM decides the next step
  • Each step is a separate LLM call — latency and cost scale with the number of iterations
  • maxSteps stops the loop — without it, the agent can run indefinitely and drain API credits
Production Insight
Each agent step is a separate LLM call — 10 iterations means 10 API calls per user message.
maxSteps is the safety net — without it, agents loop infinitely and drain API credits.
Rule: set maxSteps to 10 for production, add cost monitoring per conversation.
Key Takeaway
An agent decides tool sequences dynamically — unlike chains which follow fixed steps.
createToolCallingAgent + RunnableWithMessageHistory orchestrates the loop — maxSteps prevents infinite loops and cost overruns.
Each iteration is a separate LLM call — cost and latency scale with step count.

Supabase: Memory, Vector Search, and Conversation Storage

Supabase serves three roles in the agent architecture. PostgreSQL stores conversation history — every message (user and assistant) is persisted for session continuity. pgvector stores document embeddings for semantic search — the agent can retrieve relevant knowledge base entries. The Supabase client provides real-time subscriptions if you want to show agent activity to the user in real time.

The conversation storage pattern: each conversation has a unique ID. Messages are appended to a messages table with the conversation_id as a foreign key. When the agent processes a new message, it loads the conversation history from Supabase and prepends it to the LLM context. This gives the agent memory across turns without storing state in the application server.

Vector search uses pgvector's cosine similarity. Documents are chunked, embedded with OpenAI's text-embedding-3-small model, and stored with their embedding vectors. The match_documents RPC function performs nearest-neighbor search and returns the top-k results above a similarity threshold (use 0.78–0.82 for text-embedding-3-small in production).

io.thecodeforge.ai-agent.supabase-schema.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
-- ============================================
-- Supabase Schema for AI Agent
-- ============================================

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- ---- Conversations Table ----
-- Each conversation has a unique ID and tracks the user
CREATE TABLE public.conversations (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  title TEXT,
  total_tokens INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.conversations ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can manage their own conversations"
  ON public.conversations
  FOR ALL
  USING (auth.uid() = user_id)
  WITH CHECK (auth.uid() = user_id);

-- ---- Messages Table ----
-- Stores each message in a conversation
CREATE TABLE public.messages (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system', 'tool')),
  content TEXT NOT NULL,
  tool_calls JSONB, -- Stores tool call details for assistant messages
  tool_call_id TEXT, -- Links tool response to the original call
  token_count INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can manage messages in their conversations"
  ON public.messages
  FOR ALL
  USING (
    conversation_id IN (
      SELECT id FROM public.conversations WHERE user_id = auth.uid()
    )
  )
  WITH CHECK (
    conversation_id IN (
      SELECT id FROM public.conversations WHERE user_id = auth.uid()
    )
  );

-- Index for fast conversation history retrieval
CREATE INDEX idx_messages_conversation_created
  ON public.messages (conversation_id, created_at ASC);

-- ---- Documents Table ----
-- Knowledge base with vector embeddings
CREATE TABLE public.documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  category TEXT, -- product, pricing, troubleshooting, general
  embedding vector(1536), -- text-embedding-3-small produces 1536-dim vectors
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE public.documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Documents are readable by authenticated users"
  ON public.documents
  FOR SELECT
  USING (auth.role() = 'authenticated');

-- Index for vector similarity search
CREATE INDEX ON public.documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- ---- Vector Search RPC Function ----
-- Performs nearest-neighbor search using cosine similarity
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_count INTEGER DEFAULT 5,
  match_threshold FLOAT DEFAULT 0.8, -- Production value for text-embedding-3-small
  filter_category TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  content TEXT,
  category TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.title,
    documents.content,
    documents.category,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM public.documents
  WHERE
    (filter_category IS NULL OR documents.category = filter_category)
    AND 1 - (documents.embedding <=> query_embedding) > match_threshold
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

-- ---- Conversation Summaries Table ----
-- Stores summarized history for long conversations
CREATE TABLE public.conversation_summaries (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES public.conversations(id) ON DELETE CASCADE,
  summary TEXT NOT NULL,
  message_count INTEGER NOT NULL, -- Number of messages summarized
  token_count INTEGER NOT NULL, -- Token count of the original messages
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- ---- Updated At Trigger ----
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$;

CREATE TRIGGER conversations_updated_at
  BEFORE UPDATE ON public.conversations
  FOR EACH ROW
  EXECUTE FUNCTION update_updated_at();
Supabase as the Agent's Memory Layer
  • PostgreSQL stores conversation history — every message persisted for session continuity
  • pgvector stores document embeddings — agent retrieves relevant knowledge via semantic search
  • Conversation summaries compress long histories — control token costs for extended sessions
  • RLS policies ensure users only access their own conversations — security at the database level
  • The match_documents RPC function performs nearest-neighbor search with configurable threshold (0.78-0.82 recommended)
Production Insight
pgvector cosine similarity search retrieves relevant documents — threshold filters low-quality matches (use 0.78-0.82 for text-embedding-3-small).
Conversation history grows linearly with turns — summarize after 8,000 tokens to control costs.
Rule: store every message, track token counts per conversation, summarize when history is too long.
Key Takeaway
Supabase serves three roles: conversation storage, vector search, and session persistence.
pgvector match_documents RPC performs semantic search — cosine similarity with threshold filtering.
Summarize long conversations to control token costs — store summaries in a separate table.

Next.js Integration: Streaming API and Server Actions

The Next.js integration has two concerns: the API layer that processes agent requests, and the streaming layer that delivers token-by-token responses to the client. The Vercel AI SDK (ai package) handles both — it provides useChat for the client and streamText/StreamingTextResponse for the server.

The API route receives the user's message, loads conversation history from Supabase, runs the agent, and streams the response. The client uses useChat to manage the message list and display streaming tokens. The key pattern: the Route Handler returns streamText(...).toDataStreamResponse(), which is a streaming response that the client consumes incrementally.

For non-streaming use cases (background processing, webhook-triggered agents), use Server Actions instead. They run the agent synchronously and return the result. Server Actions are simpler but do not support streaming — the user waits for the full response.

io.thecodeforge.ai-agent.client-component.tsxTSX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
// ============================================
// Client ComponentChat Interface with Streaming
// ============================================
// File: components/chat-interface.tsx

'use client'

import { useChat } from 'ai/react'
import { useState } from 'react'
import { Button } from '@/components/ui/button'
import { Input } from '@/components/ui/input'

export function ChatInterface({ conversationId }: { conversationId?: string }) {
  const [activeConversationId, setActiveConversationId] = useState(conversationId)

  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
    body: { conversationId: activeConversationId },
    onFinish: (message) => {
      // Extract conversation ID from response headers
      // (set by the server in StreamingTextResponse headers)
    },
    onError: (error) => {
      console.error('Chat error:', error)
    },
  })

  return (
    <div className="flex flex-col h-[600px] max-w-2xl mx-auto">
      {/* Messages */}
      <div className="flex-1 overflow-y-auto space-y-4 p-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div
              className={`max-w-[80%] rounded-lg px-4 py-2 ${
                message.role === 'user'
                  ? 'bg-primary text-primary-foreground'
                  : 'bg-muted'
              }`}
            >
              <p className="text-sm whitespace-pre-wrap">{message.content}</p>
            </div>
          </div>
        ))}

        {isLoading && (
          <div className="flex justify-start">
            <div className="bg-muted rounded-lg px-4 py-2">
              <div className="flex space-x-1">
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" />
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-100" />
                <div className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce delay-200" />
              </div>
            </div>
          </div>
        )}
      </div>

      {/* Input */}
      <form onSubmit={handleSubmit} className="flex gap-2 p-4 border-t">
        <Input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask a question..."
          disabled={isLoading}
          className="flex-1"
        />
        <Button type="submit" disabled={isLoading}>
          Send
        </Button>
      </form>
    </div>
  )
}

// ============================================
// Server ActionNon-Streaming Agent (Background Processing)
// ============================================
// File: app/actions/agent.ts

'use server'

import { createAgent } from '@/lib/agent/factory'
import { createClient } from '@/lib/supabase/server'

export async function processAgentMessage(
  conversationId: string,
  message: string
) {
  const supabase = await createClient()
  const { data: { user } } = await supabase.auth.getUser()

  if (!user) {
    throw new Error('Unauthorized')
  }

  // Store user message
  await supabase.from('messages').insert({
    conversation_id: conversationId,
    role: 'user',
    content: message,
  })

  // Run agent (non-streaming)
  const agent = await createAgent(conversationId)
  const result = await agent.invoke({ input: message })

  // Store assistant response
  await supabase.from('messages').insert({
    conversation_id: conversationId,
    role: 'assistant',
    content: result.output,
  })

  return {
    answer: result.output,
    steps: result.intermediateSteps?.length ?? 0,
  }
}
Streaming vs Server Actions: When to Use Each
  • Streaming (Route Handler + streamText): real-time user-facing chat — user sees tokens as they generate
  • Server Actions: background processing, webhook-triggered agents, batch operations — no streaming
  • Route Handlers must return streamText().toDataStreamResponse() — not a plain Response or JSON
  • useChat manages the message list and streaming state — do not manually manage messages
  • Server Actions cannot stream — the user waits for the full response before seeing anything
Production Insight
Route Handlers with streamText().toDataStreamResponse() enable real-time token streaming to the client.
Server Actions are simpler but cannot stream — use them for background processing only.
Rule: use Route Handlers for user-facing chat, Server Actions for background agent tasks.
Key Takeaway
Streaming requires Route Handler + streamText() + useChat on the client.
Server Actions run agents synchronously — no streaming, simpler but slower perceived performance.
Store every message in Supabase — conversation history enables session continuity.

Tools: Building and Validating Agent Capabilities

Tools are the agent's external capabilities — each tool is a function with a name, description, and input schema. The LLM reads the tool descriptions to decide which tool to call, and parses the input schema to generate valid arguments. The quality of the description directly affects the agent's ability to use the tool correctly.

The critical pattern: validate tool inputs with Zod before execution. LLMs can hallucinate invalid inputs — missing required fields, wrong types, or malformed queries. Zod validation catches these before the tool executes, returning a clear error message that the agent can learn from.

Tool design principles: descriptions should be specific (not just "search the database"), input schemas should use .describe() on every field (the LLM reads these descriptions), and error messages should be actionable (tell the agent what went wrong and how to fix it).

io.thecodeforge.ai-agent.tools.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
// ============================================
// Tool Design Patterns
// ============================================

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

// ---- Good Tool: Clear description, validated inputs, actionable errors + deduplication ----

export const searchKnowledgeBase = new DynamicStructuredTool({
  name: 'search_knowledge_base',
  description: `Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.
Do NOT use this tool for:
- Order status queries (use get_order_status instead)
- Calculations (use calculate tool instead)
- Questions about the current date or time`,
  schema: z.object({
    query: z.string()
      .min(3, 'Query must be at least 3 characters')
      .max(200, 'Query must be under 200 characters')
      .describe('The search query — use specific keywords from the user question. Avoid full sentences.'),
    category: z.enum(['product', 'pricing', 'troubleshooting', 'general'])
      .optional()
      .describe('Filter by category if the topic is clear from the question'),
    maxResults: z.number()
      .min(1)
      .max(10)
      .default(5)
      .describe('Maximum number of results to return — use 3 for focused answers, 5 for comprehensive'),
  }),
  func: async ({ query, category, maxResults }) => {
    try {
      // Deduplication guardrail
      const resultHash = JSON.stringify({ query, category }).slice(0, 100)
      if (globalThis.seenResults?.has(resultHash)) {
        return "You've already seen this result. Answer from context or ask for clarification."
      }
      globalThis.seenResults = globalThis.seenResults || new Set()
      globalThis.seenResults.add(resultHash)

      // Generate embedding for the query
      const embedding = await generateEmbedding(query)

      // Search Supabase vector store
      const { data, error } = await supabase.rpc('match_documents', {
        query_embedding: embedding,
        match_count: maxResults,
        match_threshold: 0.8, // Production value
        filter_category: category ?? null,
      })

      if (error) {
        return `Search failed: ${error.message}. Try a simpler query or remove the category filter.`
      }

      if (!data || data.length === 0) {
        return `No results found for "${query}". Try:
1. Using different keywords
2. Removing the category filter
3. Broadening the search terms`
      }

      return data.map((doc: any, i: number) =>
        `[Result ${i + 1}] (similarity: ${doc.similarity.toFixed(2)})
Title: ${doc.title}
Content: ${doc.content.slice(0, 500)}...`
      ).join('\n\n')
    } catch (err) {
      return `Search error: ${err instanceof Error ? err.message : 'Unknown error'}. The user should try again or contact support.`
    }
  },
})

// ---- Good Tool: Input validation with meaningful error messages ----

export const createTicket = new DynamicStructuredTool({
  name: 'create_support_ticket',
  description: 'Create a support ticket when the agent cannot resolve the issue. Use this as a last resort after exhausting available tools.',
  schema: z.object({
    subject: z.string()
      .min(10, 'Subject must be at least 10 characters — provide a clear summary')
      .max(200)
      .describe('Brief subject line summarizing the issue'),
    description: z.string()
      .min(50, 'Description must be at least 50 characters — include what was tried and what failed')
      .max(2000)
      .describe('Detailed description including what the user tried and what went wrong'),
    priority: z.enum(['low', 'medium', 'high', 'urgent'])
      .describe('Priority based on impact: low = question, medium = minor issue, high = blocked, urgent = production down'),
    category: z.enum(['billing', 'technical', 'account', 'feature_request'])
      .describe('Category of the issue'),
  }),
  func: async ({ subject, description, priority, category }) => {
    // Validate business rules
    if (priority === 'urgent' && category !== 'technical') {
      return 'Urgent priority is only available for technical issues. Please use high priority instead.'
    }

    const ticket = await createTicketInDatabase({ subject, description, priority, category })

    return `Ticket created successfully.
Ticket ID: ${ticket.id}
Priority: ${priority}
Expected response time: ${getResponseTime(priority)}

Tell the user: "I've created a support ticket (ID: ${ticket.id}) for you. A team member will respond within ${getResponseTime(priority)}."`
  },
})

// ---- Helper: Generate embedding ----
async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  })
  return response.data[0].embedding
}

function getResponseTime(priority: string): string {
  const times: Record<string, string> = {
    low: '48 hours',
    medium: '24 hours',
    high: '4 hours',
    urgent: '1 hour',
  }
  return times[priority] ?? '24 hours'
}
Tool Design Best Practices
  • Descriptions must explain WHEN to use the tool — not just what it does
  • Every input field needs .describe() — the LLM reads these to generate valid arguments
  • Validate inputs with Zod before execution — LLMs can hallucinate invalid inputs
  • Error messages must be actionable — tell the agent what went wrong and how to retry
  • Return formatted text, not raw JSON — LLMs parse natural language better than JSON
Production Insight
Tool descriptions are the LLM's instructions — vague descriptions cause wrong tool selection.
Zod validation catches hallucinated inputs before execution — return clear error messages.
Rule: describe when to use the tool, validate every input, return actionable error messages.
Key Takeaway
Tool quality determines agent quality — descriptions guide the LLM's tool selection.
Validate inputs with Zod — LLMs hallucinate invalid arguments, especially in multi-step workflows.
Error messages must guide the agent — tell it what failed and how to retry with different inputs.

Cost Control: Token Management and Conversation Summarization

Token costs are the primary production concern for AI agents. Each agent iteration is a separate LLM call that includes the full conversation history, tool descriptions, and the agent's reasoning. A 20-turn conversation with 10 tool calls per turn can consume 100,000+ tokens — costing $1-10 depending on the model.

Three strategies control costs: history truncation (keep only the last N messages), conversation summarization (replace old messages with a summary), and model selection (use gpt-4o-mini for simple tasks, gpt-4o for complex reasoning). Summarization is the most effective — it preserves context while reducing token count by 80-90%.

io.thecodeforge.ai-agent.cost-control.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
// ============================================
// Cost Control: Token Management
// ============================================

import { ChatOpenAI } from '@langchain/openai'
import { createClient } from '@/lib/supabase/server'

// ---- Strategy 1: Conversation Summarization ----
// Summarize old messages when the conversation gets too long
// Reduces token count by 80-90% while preserving context

export async function getConversationHistory(conversationId: string, maxTokens: number = 8000) {
  const supabase = await createClient()

  // Get the latest summary (if any)
  const { data: summary } = await supabase
    .from('conversation_summaries')
    .select('summary, message_count')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: false })
    .limit(1)
    .single()

  // Get messages after the summary
  const offset = summary?.message_count ?? 0
  const { data: recentMessages } = await supabase
    .from('messages')
    .select('role, content, token_count')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: true })
    .range(offset, offset + 49)

  if (!recentMessages) return []

  // Build the history
  const history: { role: string; content: string }[] = []

  // Add summary as a system message if it exists
  if (summary) {
    history.push({
      role: 'system',
      content: `[Previous conversation summary]: ${summary.summary}`,
    })
  }

  // Add recent messages
  let totalTokens = summary ? estimateTokens(summary.summary) : 0
  for (const msg of recentMessages) {
    if (totalTokens + (msg.token_count ?? 0) > maxTokens) {
      break // Stop adding messages when we hit the token limit
    }
    history.push({ role: msg.role, content: msg.content })
    totalTokens += msg.token_count ?? 0
  }

  return history
}

// ---- Strategy 2: Model Selection Based on Task Complexity ----
// Use cheaper models for simple tasks, expensive models for complex reasoning

export function selectModel(taskComplexity: 'simple' | 'moderate' | 'complex') {
  const models = {
    simple: new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
      maxTokens: 500, // Limit output tokens for simple responses
    }),
    moderate: new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
      maxTokens: 1000,
    }),
    complex: new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0,
      maxTokens: 2000,
    }),
  }
  return models[taskComplexity]
}

// ---- Strategy 3: Token Budget Per Conversation ----
// Track and enforce token limits per conversation

export async function checkTokenBudget(conversationId: string): Promise<boolean> {
  const supabase = await createClient()
  const MAX_TOKENS_PER_CONVERSATION = 50000

  const { data: conversation } = await supabase
    .from('conversations')
    .select('total_tokens')
    .eq('id', conversationId)
    .single()

  if (!conversation) return true

  if (conversation.total_tokens >= MAX_TOKENS_PER_CONVERSATION) {
    // Summarize and continue
    await summarizeConversation(conversationId)
    return true
  }

  return true
}

// ---- Cost Estimation ----
// Estimate cost before running the agent

const MODEL_COSTS: Record<string, { input: number; output: number }> = {
  'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 },
  'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 },
  'text-embedding-3-small': { input: 0.02 / 1_000_000, output: 0 },
}

export function estimateCost(
  modelName: string,
  inputTokens: number,
  outputTokens: number
): number {
  const costs = MODEL_COSTS[modelName] ?? MODEL_COSTS['gpt-4o']
  return (inputTokens * costs.input) + (outputTokens * costs.output)
}

// ---- Usage Tracking ----
// Store token usage per conversation for billing and monitoring

export async function trackUsage(
  conversationId: string,
  modelName: string,
  inputTokens: number,
  outputTokens: number
) {
  const supabase = await createClient()
  const cost = estimateCost(modelName, inputTokens, outputTokens)

  await supabase.from('usage_logs').insert({
    conversation_id: conversationId,
    model: modelName,
    input_tokens: inputTokens,
    output_tokens: outputTokens,
    cost,
  })

  // Update conversation total
  await supabase
    .from('conversations')
    .update({
      total_tokens: supabase.rpc('increment_conversation_tokens', {
        conv_id: conversationId,
        tokens: inputTokens + outputTokens,
      }),
    })
    .eq('id', conversationId)
}
Token Costs Scale with Conversation Length
  • Each agent iteration sends the full conversation history to the LLM — costs compound
  • A 20-turn conversation with 10 tool calls = 200 LLM calls = 100,000+ tokens
  • Summarization reduces token count by 80-90% while preserving context
  • Use gpt-4o-mini for simple tasks ($0.15/M input tokens) vs gpt-4o ($2.50/M input tokens)
  • Track per-conversation token usage in Supabase — set hard limits and alert on budget exceedance
Production Insight
Each agent iteration includes the full conversation history — costs compound with every turn.
Summarization reduces token count by 80-90% while preserving key context.
Rule: summarize after 8,000 tokens, use gpt-4o-mini for simple tasks, track per-conversation costs.
Key Takeaway
Token costs are the primary production concern — each iteration sends full history to the LLM.
Summarization, model selection, and token budgets control spend — implement all three.
Track per-conversation usage in Supabase — set hard limits and alert on exceedance.

Deployment: Environment, Monitoring, and Failure Handling

Production deployment requires three additions beyond the development setup: environment variable management (API keys, Supabase credentials), monitoring (token usage, error rates, latency), and failure handling (tool errors, LLM timeouts, rate limits).

The deployment target matters: Vercel has a 10-second timeout for Serverless Functions (300 seconds on Pro plan). Agent loops with multiple tool calls can exceed this. For long-running agents, use Vercel's Edge Runtime (no timeout limit) or deploy the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service).

io.thecodeforge.ai-agent.deployment.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
// ============================================
// Deployment Configuration
// ============================================

// ---- Environment Variables (.env.local) ----
// NEVER commit API keys to the repository

// OPENAI_API_KEY=sk-...
// SUPABASE_URL=https://your-project.supabase.co
// SUPABASE_ANON_KEY=eyJ...
// SUPABASE_SERVICE_ROLE_KEY=eyJ... (for server-side only)
// NODE_ENV=production

// ---- next.config.ts ----
// Configure for AI workloads

import type { NextConfig } from 'next'

const nextConfig: NextConfig = {
  // Increase body size limit for conversation history
  experimental: {
    serverActions: {
      bodySizeLimit: '2mb',
    },
  },
  // Configure for Vercel deployment
  serverExternalPackages: ['@langchain/openai', '@langchain/core'],
}

export default nextConfig

// ---- Rate Limiting (Upstash Redis — works in serverless) ----
// In-memory Map dies in serverless — use Redis

import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN! })

const ratelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
})

export async function rateLimitMiddleware(request: NextRequest) {
  if (request.nextUrl.pathname.startsWith('/api/chat')) {
    const ip = request.headers.get('x-forwarded-for') ?? 'unknown'
    const { success } = await ratelimit.limit(ip)
    if (!success) {
      return NextResponse.json(
        { error: 'Rate limit exceeded. Please wait before sending another message.' },
        { status: 429 }
      )
    }
  }
  return NextResponse.next()
}

// ---- Middleware: Rate Limiting ----
// Prevent abuse by limiting requests per user
export const config = {
  matcher: ['/api/:path*'],
}

// ---- Error Handling Wrapper ----
// Catch and log agent errors for observability
export async function runAgentWithErrorHandling(
  conversationId: string,
  input: string
) {
  const startTime = Date.now()

  try {
    const agent = await createAgent(conversationId)
    const result = await agent.invoke({ input })

    // Log success
    await logAgentRun({
      conversationId,
      status: 'success',
      duration: Date.now() - startTime,
      steps: result.intermediateSteps?.length ?? 0,
      input,
      output: result.output,
    })

    return result
  } catch (error) {
    // Log failure
    await logAgentRun({
      conversationId,
      status: 'error',
      duration: Date.now() - startTime,
      error: error instanceof Error ? error.message : 'Unknown error',
      input,
    })

    // Return user-friendly error
    if (error instanceof Error && error.message.includes('timeout')) {
      return {
        output: 'The request took too long to process. Please try a simpler question or try again later.',
      }
    }

    if (error instanceof Error && error.message.includes('rate_limit')) {
      return {
        output: 'The AI service is currently experiencing high demand. Please wait a moment and try again.',
      }
    }

    return {
      output: 'I encountered an error processing your request. A support ticket has been created automatically.',
    }
  }
}

async function logAgentRun(data: Record<string, unknown>) {
  const supabase = await createClient()
  await supabase.from('agent_runs').insert(data)
}

// ---- Health Check Endpoint ----
// File: app/api/health/route.ts
export async function GET() {
  const checks = {
    openai: false,
    supabase: false,
  }

  // Check OpenAI
  try {
    const response = await fetch('https://api.openai.com/v1/models', {
      headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
    })
    checks.openai = response.ok
  } catch {
    checks.openai = false
  }

  // Check Supabase
  try {
    const supabase = await createClient()
    const { error } = await supabase.from('conversations').select('id').limit(1)
    checks.supabase = !error
  } catch {
    checks.supabase = false
  }

  const healthy = Object.values(checks).every(Boolean)

  return Response.json(
    { status: healthy ? 'healthy' : 'degraded', checks },
    { status: healthy ? 200 : 503 }
  )
}
Deployment Considerations
  • Vercel Serverless Functions have a 10-second timeout (300s on Pro) — agent loops may exceed this
  • Use Edge Runtime for long-running agents — no timeout limit, but no Node.js APIs
  • Rate limiting prevents abuse — 10 messages per minute per user is a reasonable default
  • Health checks verify OpenAI and Supabase connectivity — return 503 if either is down
  • Log every agent run with duration, steps, and errors — observability is critical for debugging
Production Insight
Vercel Serverless Functions have a 10-second timeout — agent loops with many tools may exceed it.
Rate limiting prevents abuse — 10 messages per minute per user is a reasonable baseline.
Rule: log every agent run with duration and error details — observability is critical for production.
Key Takeaway
Deployment requires rate limiting, health checks, and error handling — not just the agent code.
Vercel timeout limits constrain agent loop length — consider Edge Runtime or separate services.
Log every agent run — duration, steps, errors, and token usage for observability.

What No One Tells You About Prerequisites — And Why Missing One Wastes a Day

You don't need a laundry list of packages. You need a runtime that won't silently fail on streaming. Node 18+ is mandatory. Anything less and LangChain's callback handlers will hang your agent mid-loop with zero error messages.

Your local database matters too. Supabase local development with supabase start is non-negotiable. Prod-like indexes, real-time subscriptions, and row-level security all behave differently in the cloud if you've been testing against a remote project from day one. Burn that hour upfront.

CopilotKit's @copilotkit/react-core expects a context provider wrapping your Next.js app. Forget this and your agent will appear deaf — no errors, just an unresponsive chat window. The docs bury this. Now you know.

Last trap: OpenAI keys with no budget cap. Your agent's first runaway loop costs you $12 before you catch it. Set a hard limit in the OpenAI dashboard. Your wallet will thank you.

PrerequisiteCheck.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — javascript tutorial

// Verify environment before you write a single agent
const REQUIRED_NODE_VERSION = 18;

const nodeMajor = parseInt(process.version.slice(1).split('.')[0], 10);
if (nodeMajor < REQUIRED_NODE_VERSION) {
  console.error(`FATAL: Node ${process.version} detected. Need >= ${REQUIRED_NODE_VERSION}.`);
  process.exit(1);
}

// Confirm Supabase local chain is alive
import { createClient } from '@supabase/supabase-js';

const localSupabase = createClient(
  'http://localhost:54321',
  process.env.SUPABASE_LOCAL_SERVICE_ROLE_KEY
);

const { data, error } = await localSupabase.from('conversations').select('count').single();
if (error) {
  console.error('Supabase local container not responding. Run `supabase start` first.');
  process.exit(1);
}

console.log(`Env validated. Node ${process.version}, Supabase live. Building agent.`);
Output
Env validated. Node 20.11.0, Supabase live. Building agent.
Production Trap:
Never source your Supabase URL from a .env in CI. Use supabase link and supabase secrets instead. Hardcoded project refs produce silent 500s when your staging Supabase project restarts overnight.
Key Takeaway
Prerequisites are not a checklist — they are a gating process. Test every dependency before you write one line of agent logic.

The Dependency Install That Breaks — And How to Preempt It

npm install @copilotkit/react-core @copilotkit/runtime langchain @langchain/openai supabase looks innocent. But LangChain's peer dependency on zod will fight CopilotKit's version every time. You end up with two zod instances in your bundle, and your tool schemas silently pass anything — including malformed JSON.

Use npm ls zod after install. If you see more than one version, run npm dedupe and pray. Better yet: use a .npmrc with legacy-peer-deps=true for LangChain's ecosystem. It's ugly. It works.

CopilotKit's runtime package requires openai as a peer. LangChain uses its own wrapper. Import order matters. Pull in @langchain/openai before @copilotkit/runtime. If you reverse them, you get cryptic errors about missing ChatOpenAI constructors.

One more footgun: Supabase's @supabase/ssr package for Next.js App Router. Your server actions will drop session cookies unless you use the createServerClient factory exactly as their docs show. Copy-paste it. Don't reimplement.

DependencyCheck.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — javascript tutorial

// Run this after install to catch version fights
import { execSync } from 'child_process';

try {
  const zodVersions = execSync('npm ls zod --depth=0 --json', { encoding: 'utf8' });
  const parsed = JSON.parse(zodVersions);
  const deps = parsed.dependencies || {};
  const uniqueVersions = new Set(Object.values(deps).map(d => d.version));
  
  if (uniqueVersions.size > 1) {
    console.warn('WARN: Multiple zod versions detected. Run `npm dedupe`.');
    console.table(deps);
  } else {
    console.log('OK: Single zod version in tree.');
  }
} catch {
  console.log('zod not installed yet — clean slate.');
}
Output
WARN: Multiple zod versions detected. Run `npm dedupe`.
┌─────────┬─────────┐
│ (index) │ version │
├─────────┼─────────┤
│ langchain │ 3.22.4 │
│ copilotkit │ 3.21.0 │
└─────────┴─────────┘
Senior Shortcut:
Lock LangChain to @langchain/core@0.1.50 — anything newer breaks CopilotKit's runtime. Hardcode this in your package.json resolutions field. You'll thank me when the next major ships.
Key Takeaway
Package resolution in the LangChain + CopilotKit + Supabase stack is a bug farm. Validate your tree, pin versions, and import in the right order.

Why Your Agent Can't See Supabase Data — And How to Fix It

You wired the database. The agent calls a tool named searchBlogs. The tool returns nothing. You check Supabase — rows are there. The problem isn't your SQL. It's the invisible wall between CopilotKit's frontend runtime and your Supabase client.

CopilotKit agents run on the server. When you pass supabase as a client to your tool function, you're passing a browser-side client. It can't see rows protected by Row Level Security unless you forward the authenticated session. The agent gets zero results. No error. Just silence.

Solution: create a server-side Supabase client inside your tool handler using the user's JWT from the request context. CopilotKit's CopilotRuntime exposes context.user with the token. Use createServerClient with that token. Every tool call then runs with the user's permissions.

Second trap: vector search queries return null metadata when your match_documents function doesn't select the right columns. Always explicitly select metadata -> 'title' in your Postgres function. The agent needs structured data, not a JSON blob.

SupabaseAuthTool.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// io.thecodeforge — javascript tutorial

// Server-side tool with authenticated Supabase access
import { createServerClient } from '@supabase/ssr';
import { tool } from '@langchain/core/tools';
import { z } from 'zod';

export const searchUserBlogs = tool(
  async ({ query, userToken }) => {
    const supabase = createServerClient(
      process.env.SUPABASE_URL!,
      process.env.SUPABASE_SERVICE_ROLE_KEY!,
      {
        cookies: {
          get: () => null, // Not used here — we use token directly
        },
        auth: {
          autoRefreshToken: false,
          detectSessionInUrl: false,
        },
      }
    );

    // Impersonate the user with their JWT
    const { data, error } = await supabase.auth.setSession({
      access_token: userToken,
      refresh_token: '',
    });

    if (error || !data.user) {
      return JSON.stringify({ error: 'Session expired' });
    }

    const { data: blogs } = await supabase
      .from('blogs')
      .select('title, content, created_at')
      .textSearch('content', query)
      .limit(5);

    return JSON.stringify(blogs || []);
  },
  {
    name: 'searchUserBlogs',
    description: 'Search the authenticated user\'s blog posts',
    schema: z.object({
      query: z.string(),
      userToken: z.string(),
    }),
  }
);
Output
[{"title":"Building AI Agents","content":"...","created_at":"2024-03-15T10:00:00Z"}]
Production Trap:
Never use the service_role key here. It bypasses RLS and exposes every user's data. Use the user's JWT + createServerClient to enforce row-level security. Your clients will thank you when the audit comes.
Key Takeaway
Your agent's Supabase queries fail because the auth context is wrong — not the SQL. Always pass the user JWT into tool handlers and create a fresh server client.

The Dependency Install That Breaks — And How to Preempt It

You think npm install langchain is safe? Wrong. The LangChain.js ecosystem is a minefield of breaking updates that nuke your agent loop before it runs. You pull a version that dropped Chain in favor of Runnable without warning — now your imports fail silently, your streaming endpoint crashes, and you waste a day tracing phantom bugs.

The fix: lock your versions before you even think about coding. Use npm install langchain@0.1.x — the stable 0.1 series. Pair it with @langchain/openai@0.0.x and @langchain/community@0.0.x. Forget that, and you'll hit Cannot find module '@langchain/core/dist/runnables' at 2 AM.

Second: test your install with a single agent creation before wiring into Next.js. Run node -e "const { OpenAI } = require('@langchain/openai'); console.log('ok')". If it errors, your package manager is hoisting wrong. Set OVERWRITE to true in your .npmrc to force flat node_modules. This kills the "why can't LangChain see OpenAI" problem dead.

verify-install.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — javascript tutorial
// Verify LangChain + OpenAI install before Next.js

const { OpenAI } = require('@langchain/openai');
const { ChatOpenAI } = require('@langchain/openai');

const model = new ChatOpenAI({ temperature: 0 });

async function test() {
  const res = await model.invoke('Just say OK');
  console.log('Agent ready:', res.content);
}

test().catch(err => console.error('Install broken:', err.message));
Output
Agent ready: OK
Production Trap:
Never use npm update on LangChain packages without reading changelogs. One ^0.2.0 bump and your streaming logic breaks because processResponse was renamed to parseStreamedChunks.
Key Takeaway
Lock LangChain to 0.1.x before a single import. Test install standalone before touching Next.js routes.

Why Your Agent Can't See Supabase Data — And How to Fix It

You built a tool that queries Supabase for user invoices. Your agent returns "I can't find that data" even though the Supabase table is full. Problem: you passed raw SQL to an AI agent that doesn't know your schema. The agent generates hallucinated column names like user_email when your table has email_address. That's not a bug — it's a broken tool design.

Fix: wrap Supabase queries in validated tools that enforce structure before the agent touches them. Use Zod schemas to sanitize inputs and return predictable JSON. For example, your getInvoice tool should accept only userId as a string, query the correct table with typed columns, and return { status: 'found' | 'missing', data: object } every time.

Second: give the agent a schema context. When you define the tool, include a description that lists exact column names: "Find invoices by userId. Table fields: id, email_address, amount, created_at." This stops the agent from inventing its own field names. Test the tool in isolation — call it directly, see the output, then wire it into the agent. If your tool works but the agent fails, your schema description is wrong, not your code.

safe-supabase-tool.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// io.thecodeforge — javascript tutorial
// Validated Supabase tool with Zod schema

import { z } from 'zod';
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_SERVICE_KEY);

const schema = z.object({
  userId: z.string().uuid()
});

export const getInvoiceTool = {
  name: 'get_invoice',
  description: 'Find invoice by userId. Fields: id, email_address, amount, created_at.',
  schema,
  async call({ userId }) {
    const { data, error } = await supabase
      .from('invoices')
      .select('id, email_address, amount, created_at')
      .eq('user_id', userId)
      .single();

    if (error || !data) return { status: 'missing', data: null };
    return { status: 'found', data };
  }
};
Output
{ status: 'found', data: { id: 42, email_address: 'dev@example.com', amount: 199, created_at: '2024-03-15' } }
Senior Shortcut:
Always hardcode column names in the tool description. If the agent still hallucinates, prefix your description with 'CRITICAL: Do not change field names.' Works 90% of the time.
Key Takeaway
Your agent doesn't guess Supabase columns — it hallucinates them. Validate schema in the tool, spell out fields in the description.

Why Your LangChain Agent Fails in Production — And How Next.js App Router Fixes It

Most tutorials show agent loops in simple Node scripts. In production, you need request-scoped context, streaming to the UI, and error recovery without dropping user sessions. Next.js App Router provides this through Server Actions that maintain agent state across multiple function calls. The key insight is that your agent loop must be a controlled, resumable process — not a single long-running function. By combining Next.js Server Actions with LangChain's RunnableWithMessageHistory, you create a durable agent that survives page refreshes and network interruptions. Each action call picks up the conversation from Supabase, executes one turn of the agent loop, and streams the response back. This pattern eliminates the common timeout and memory leak issues that plague monolithic agent implementations.

agent-action.tsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — javascript tutorial
import { createServerActionClient } from '@supabase/auth-helpers-nextjs'
import { RunnableWithMessageHistory } from '@langchain/core/runnables'
import { cookies } from 'next/headers'

export async function agentAction(input: string) {
  const supabase = createServerActionClient({ cookies })
  const { data: { session } } = await supabase.auth.getSession()
  const agent = new RunnableWithMessageHistory({
    runnable: agentChain,
    getMessageHistory: (sessionId) => new SupabaseChatHistory(sessionId, supabase),
    inputMessagesKey: 'input',
    historyMessagesKey: 'chat_history'
  })
  const stream = await agent.stream({ input }, { configurable: { sessionId: session.user.id } })
  return stream
}
Output
Returns ReadableStream for Next.js streaming response
Production Trap:
Never create a new agent instance per request. Use dependency injection to share the compiled chain across requests — but scope the message history to the user session.
Key Takeaway
Server Actions + RunnableWithMessageHistory = durable, resumable agents.

The Vector Search Index That Wastes Your Credits — And How LangChain + Supabase Fixes It

Every vector search call costs you token credits and latency. Naive implementations re-embed the user query on every turn, often searching irrelevant chunks that burn through your context window. The fix is a two-tier retrieval system using Supabase's native vector extension and LangChain's parent document retriever. First, embed your documents once with OpenAI embeddings and store both chunk-level and document-level vectors in Supabase. Then, at query time, retrieve only relevant document IDs first, then fetch their children chunks. This reduces token consumption by 70% and eliminates irrelevant context injection. Supabase supports this natively with SQL functions that filter by metadata before performing the cosine similarity search. Your agent only sees the top-k relevant chunks, not the entire document base.

parent-retriever.tsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — javascript tutorial
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase'
import { ParentDocumentRetriever } from 'langchain/retrievers/parent_document'
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'

export async function createRetriever(supabaseClient) {
  const store = new SupabaseVectorStore(new OpenAIEmbeddings(), {
    client: supabaseClient,
    tableName: 'documents',
    queryName: 'match_documents'
  })
  const retriever = new ParentDocumentRetriever({
    vectorstore: store,
    parentSplitter: new RecursiveCharacterTextSplitter({ chunkSize: 2000 }),
    childSplitter: new RecursiveCharacterTextSplitter({ chunkSize: 400 }),
    childK: 20,
    parentK: 5
  })
  return retriever
}
Output
Retriever that returns only 5 parent documents (max 10k tokens) instead of 50 raw chunks (20k+ tokens)
Production Trap:
Default Supabase vector search returns raw chunks without deduplication. Add a GROUP BY document_id in your match_documents SQL function to avoid flooding the context window with overlapping content.
Key Takeaway
Parent document retriever cuts token costs 70% by fetching whole documents, not random chunks.
● Production incidentPOST-MORTEMseverity: high

Agent Infinite Loop Cost $4,200 in OpenAI Credits in 3 Hours

Symptom
The billing alert fired at 2 AM — OpenAI API spend had exceeded the daily budget by 84x. The agent logs showed 14,000 tool calls for a single conversation. The agent was stuck in a loop: search_knowledge_base -> result -> "I need more information" -> search_knowledge_base -> same result -> repeat.
Assumption
The AgentExecutor default maxIterations (15) would prevent infinite loops. The team assumed the agent would stop after 15 steps and return a partial answer.
Root cause
The team had set maxIterations to 0 (unlimited) during development to debug a complex multi-step workflow and never changed it back. The agent's prompt did not include a fallback instruction for when tools return no new information. The knowledge base search returned the same 3 results every time, but the agent kept rephrasing the query because the prompt said "search until you find the answer." Without a max iteration limit or a "stop if no new information" instruction, the loop continued indefinitely.
Fix
Set maxSteps to 10 for production. Added a tool result deduplication check — if the same search result is returned twice, the agent is instructed to synthesize an answer from available information instead of searching again. Added a hard cost ceiling via OpenAI usage limits ($100/day). Added per-conversation token tracking in Supabase — conversations exceeding 8,000 tokens trigger automatic summarization. Added an alert for any conversation exceeding 20 tool calls.
Key lesson
  • Always set maxSteps on the agent — unlimited loops will drain your API budget.
  • Include fallback instructions in the agent prompt: what to do when tools return no new information.
  • Set OpenAI usage limits as a safety net — they are the last line of defense against cost overruns.
  • Track per-conversation token usage — summarize or truncate history when it exceeds your budget threshold.
Production debug guideDiagnose agent loops, tool failures, and memory issues7 entries
Symptom · 01
Agent enters infinite tool-calling loop
Fix
Check maxSteps on the agent — set to 10 for production. Add deduplication for tool results.
Symptom · 02
Agent hallucinates tool input parameters
Fix
Add Zod schema validation to tool input — reject invalid inputs before the tool executes
Symptom · 03
Conversation history not persisting across requests
Fix
Verify Supabase client is writing to the conversations table on each message — check RLS policies
Symptom · 04
Streaming response stops mid-token
Fix
Check that the Route Handler returns streamText().toDataStreamResponse() and the client uses useChat from Vercel AI SDK
Symptom · 05
Agent ignores tools and answers directly
Fix
Check the agent prompt — ensure tool descriptions are clear and the prompt instructs the agent to use tools
Symptom · 06
Vector search returns irrelevant results
Fix
Check embedding model consistency — the same model must be used for indexing and querying. Verify the match_threshold parameter (0.78-0.82 for text-embedding-3-small).
Symptom · 07
Token costs higher than expected
Fix
Check conversation history length — long histories multiply token usage. Implement summarization for conversations over 8,000 tokens.
★ AI Agent Quick Debug ReferenceFast commands for diagnosing agent, tool, and memory issues
Agent not calling tools
Immediate action
Check agent prompt and tool descriptions
Commands
grep -rn 'tool\|Tool\|description' lib/agent/ --include='*.ts' | head -20
cat lib/agent/tools.ts | head -60
Fix now
Verify tools are passed to createToolCallingAgent and each tool has a clear description string
Supabase vector search returning empty+
Immediate action
Check if embeddings exist in the database
Commands
curl -s "${SUPABASE_URL}/rest/v1/documents?select=count" -H "apikey: ${SUPABASE_ANON_KEY}" | jq
curl -s "${SUPABASE_URL}/rest/v1/rpc/match_documents" -X POST -H "apikey: ${SUPABASE_ANON_KEY}" -H "Content-Type: application/json" -d '{"query_embedding":[0.1,0.2],"match_count":5}' | jq
Fix now
Verify the match_documents RPC function exists and embeddings were generated with the same model
Streaming not working+
Immediate action
Check the Route Handler returns a streaming response
Commands
grep -rn 'streamText\|toDataStreamResponse' app/api/ --include='*.ts' | head -10
curl -s -N http://localhost:3000/api/chat -X POST -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"hello"}]}' | head -5
Fix now
Use streamText from ai package and ensure the client uses useChat hook
Token costs spiking+
Immediate action
Check conversation history length and token usage
Commands
grep -rn 'tokenCount\|token_count\|usage' lib/agent/ --include='*.ts' | head -10
psql "${DATABASE_URL}" -c "SELECT conversation_id, SUM(token_count) as total FROM messages GROUP BY conversation_id ORDER BY total DESC LIMIT 10"
Fix now
Implement conversation summarization when history exceeds 8,000 tokens
Agent Frameworks Compared
FrameworkLanguageAgent TypeStreamingTool EcosystemProduction ReadyBest For
LangChainPython, JS/TSReAct, Tool CallingYes (with streamEvents)LargeYes*Complex multi-tool agents, RAG pipelines
Vercel AI SDKJS/TSOpenAI FunctionsYes (native)SmallYesSimple chat agents, streaming-first apps
CrewAIPythonRole-based multi-agentLimitedMediumGrowingMulti-agent collaboration, research tasks
AutoGenPythonConversational multi-agentYesMediumYesMulti-agent conversations, code generation
Direct OpenAI APIAnyFunction callingYesNone (manual)YesSimple single-tool agents, full control

Key takeaways

1
An agent decides tool sequences dynamically
unlike chains which follow fixed steps
2
maxSteps on the agent prevents infinite loops
always set it to 10 in production
3
Supabase stores conversation history and vector embeddings
the database is the agent's memory
4
Conversation summarization reduces token costs by 80-90% while preserving context
5
Tool descriptions guide the LLM's tool selection
write them like instructions, not documentation
6
Validate tool inputs with Zod
LLMs hallucinate invalid arguments in multi-step workflows

Common mistakes to avoid

8 patterns
×

No maxSteps on the agent

Symptom
Agent enters infinite tool-calling loop. Consumes millions of tokens and thousands of dollars in API credits within hours. The agent keeps calling the same tool with slightly different inputs, never reaching a conclusion.
Fix
Set maxSteps to 10 on createToolCallingAgent. Add deduplication logic to detect repeated tool results. Set OpenAI usage limits as a safety net. Monitor per-conversation token counts.
×

Vague tool descriptions that do not explain when to use the tool

Symptom
Agent either never uses the tool (does not know when it is relevant) or uses it incorrectly (applies it to the wrong type of query). Tool selection accuracy drops below 50%.
Fix
Write tool descriptions that explain WHEN to use the tool, not just what it does. Include examples of good queries. List what the tool should NOT be used for. Add .describe() to every Zod schema field.
×

No input validation on tool parameters

Symptom
Tool receives hallucinated inputs from the LLM — missing required fields, wrong types, malformed queries. Tool crashes with unhandled errors, and the agent gets an unclear error message.
Fix
Validate all tool inputs with Zod before execution. Return clear, actionable error messages that tell the agent what went wrong and how to retry with valid inputs.
×

Storing conversation history only in application memory

Symptom
Conversation history is lost when the serverless function terminates. Users lose context on every page refresh. Multi-turn conversations do not work across requests.
Fix
Store conversation history in Supabase PostgreSQL. Load history from the database on every request. Use a custom history class that matches your token-tracking schema.
×

No conversation summarization for long sessions

Symptom
Token costs increase linearly with conversation length. A 50-turn conversation consumes 100,000+ tokens per agent iteration. Monthly API costs exceed budget by 5-10x.
Fix
Implement conversation summarization when history exceeds 8,000 tokens. Use gpt-4o-mini for summarization (cheaper). Replace old messages with a summary in the agent context.
×

Using gpt-4o for all tasks including simple lookups

Symptom
Simple tool calls (order status lookups, basic searches) cost 15x more than necessary. Monthly API costs are dominated by trivial operations that do not require advanced reasoning.
Fix
Use gpt-4o-mini for simple tasks (order lookups, basic searches) and gpt-4o for complex reasoning (multi-tool workflows, refund calculations). Route tasks to the appropriate model based on complexity.
×

No rate limiting on the chat API endpoint

Symptom
A single user or bot sends 1,000 messages per minute, exhausting the OpenAI rate limit and causing errors for all users. API costs spike unexpectedly.
Fix
Add rate limiting middleware using Upstash Redis (sliding window) — 10 messages per minute per user is a reasonable default.
×

Not logging agent runs for observability

Symptom
When the agent produces a wrong answer or enters a loop, there is no way to debug what happened. No visibility into which tools were called, what inputs were used, or how many steps were taken.
Fix
Log every agent run with: conversation ID, input, output, intermediate steps, duration, token count, and error details. Use these logs to identify patterns in agent failures.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between an AI agent and a chain in LangChain?
Q02SENIOR
How do you prevent an AI agent from entering an infinite loop?
Q03SENIOR
How do you manage conversation memory in a stateless serverless environm...
Q04SENIOR
How do you control token costs for an AI agent in production?
Q05JUNIOR
What is the role of Supabase in an AI agent architecture?
Q01 of 05SENIOR

What is the difference between an AI agent and a chain in LangChain?

ANSWER
A chain follows a fixed sequence of steps — retrieve context, format prompt, generate response. The sequence is defined at build time and does not change based on the input. An agent decides the sequence dynamically. It uses an LLM to reason about which tool to call, calls the tool, observes the result, and decides the next step. The sequence is determined at runtime based on the input and intermediate results. The trade-off: agents are more flexible (they can handle unexpected queries by choosing different tools) but more expensive (each step is a separate LLM call) and harder to debug (the sequence is non-deterministic). Chains are cheaper, faster, and predictable but limited to predefined workflows.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Can I use a different LLM instead of OpenAI with LangChain?
02
How do I test an AI agent in my CI/CD pipeline?
03
How do I handle multi-agent architectures where agents delegate to each other?
04
What is the difference between gpt-4o and gpt-4o-mini for agent tasks?
05
How do I deploy an AI agent that exceeds Vercel's serverless timeout?
🔥

That's React.js. Mark it forged?

11 min read · try the examples if you haven't

Previous
tRPC v11 + Next.js 16: Complete Setup and Best Practices
28 / 47 · React.js
Next
Next.js 16 Caching Strategies Explained: The 2026 Guide