An AI Agent uses an LLM to decide which tools to call and in what order — unlike a chain, it reasons about actions
LangChain provides the agent framework — createToolCallingAgent + RunnableWithMessageHistory runs the ReAct loop (reason, act, observe)
Supabase stores conversation history and vector embeddings for long-term memory
Next.js 16 Server Actions and Route Handlers stream responses via Vercel AI SDK (streamText)
Token costs scale with conversation length — truncate or summarize history to control spend
Biggest mistake: no tool guardrails — agents can call tools infinitely without maxSteps or deduplication
✦ Definition~90s read
What is AI Agent Infinite Loops — LangChain + Next.js Cost Control?
An AI agent loop is the runtime cycle where an LLM repeatedly decides, acts, and observes until it reaches a final answer — but without guardrails, it becomes a money incinerator. Each iteration calls the model, processes tool outputs, and re-invokes the LLM, burning tokens on every turn.
★
An AI Agent is like a smart assistant with a toolbox.
In a LangChain agent hooked into a Next.js app, a single user query can trigger dozens of loops, each costing fractions of a cent that compound into real bills. The problem isn't the loop itself — it's unbounded loops, where the agent gets stuck rethinking, re-calling tools, or generating verbose intermediate reasoning without ever terminating.
This article walks you through building an agent that uses Supabase for persistent memory and vector search, Next.js for streaming responses via Server Actions, and explicit cost controls like token budgets, loop limits, and conversation summarization. You'll learn to instrument every step, cap spend per session, and detect infinite loops before they drain your API credits.
Plain-English First
An AI Agent is like a smart assistant with a toolbox. Instead of just answering questions, it decides which tool to use — searching the web, querying a database, running calculations — and uses the result to inform its next step. LangChain is the framework that manages this decision loop. Supabase stores the conversation so the agent remembers what happened earlier. Next.js serves the interface and streams the agent's responses back to the user in real time.
Building an AI agent requires orchestrating four components: the LLM (reasoning engine), tools (external capabilities), memory (conversation history), and a runtime loop (agent executor). LangChain provides the agent framework with ReAct prompting — the model reasons about what to do, selects a tool, observes the result, and decides the next step.
Supabase serves two roles: PostgreSQL stores conversation history for session continuity, and pgvector stores embeddings for semantic search over past interactions. Next.js 16 provides the API layer — Server Actions and Route Handlers process requests, and the Vercel AI SDK streams token-by-token responses to the client.
The production challenges are cost control (token usage scales with history length), latency (multi-step agent loops add sequential delay), reliability (tools can fail, LLMs can hallucinate tool inputs), and observability (debugging why an agent chose a specific action). This guide covers the complete implementation with production patterns for each.
What an AI Agent Loop Actually Is — and Why It Burns Money
An AI agent loop is a recursive function where an LLM generates actions, those actions trigger tool calls or API responses, and the output feeds back into the LLM for the next decision. In a Next.js + LangChain + Supabase stack, this loop lives server-side (often in an API route or server action) and persists state via Supabase's pgvector for memory and tool results. The core mechanic: the LLM decides whether to continue or stop based on a system prompt and accumulated context — but without hard cost guards, it can spin indefinitely, each turn costing tokens and API latency.
In practice, the loop runs inside a while or for loop in LangChain's AgentExecutor. Each iteration: (1) LLM call with current conversation + tool results, (2) parse output for action or final answer, (3) if action, execute tool (e.g., Supabase query, external API), (4) append result to memory, (5) repeat. The critical property is that the LLM has no inherent stop condition — it relies on prompt instructions or a max-iteration parameter. Without explicit cost control, a single user request can trigger 20+ LLM calls, each costing $0.01–$0.03, burning $0.60+ per request.
Use this pattern when you need autonomous multi-step reasoning: data enrichment pipelines, customer support triage, or research assistants. It matters in production because naive loops are the #1 cause of runaway costs in LLM applications. A single buggy tool response (e.g., a Supabase query returning an unexpected null) can cause the agent to retry or hallucinate indefinitely, turning a $0.05 request into a $5.00 bill. Always pair the loop with a hard iteration cap (e.g., 10 turns) and a token budget per turn.
The Infinite Loop Is Not a Bug — It's a Feature of the Architecture
The LLM will happily keep calling tools forever if you let it. The loop doesn't 'break' — it just keeps spending. You must enforce a max-iteration limit and a total token budget.
Production Insight
A customer support agent with a Supabase tool that returns empty results for missing user IDs caused the agent to retry the same query 47 times in one session, burning $1.20 in LLM calls before the user canceled.
Symptom: a single user request generates 30+ LLM calls in server logs, all repeating the same tool call pattern.
Rule of thumb: always set maxIterations ≤ 10 and wrap each tool call in a try-catch that returns a clear 'no data' message to break the loop.
Key Takeaway
An AI agent loop is a recursive LLM-tool cycle that must be bounded by code, not by the model's will.
Without a hard iteration cap and token budget, a single request can cost more than a month of traditional API usage.
Always log iteration count and cumulative token cost per session — treat them as critical metrics, not debug info.
Architecture: Agent, Tools, Memory, and Runtime
An AI agent has four components that work together in a loop. The LLM is the reasoning engine — it reads the conversation history and tool descriptions, decides which tool to call, and interprets the result. Tools are external functions the agent can invoke execution, API calls. Memory stores conversation history so the agent maintains context across turns. The runtime (AgentExecutor) orchestrates the loop: send context to the LLM, parse the tool call, execute the tool, append the result, repeat.
The key difference between an agent and a chain: a chain follows a fixed sequence of steps (retrieve -> format -> generate). An agent decides the sequence dynamically — it might search, then calculate, then search again, then answer. This flexibility comes at a cost: each step is a separate LLM call, adding latency and token usage.
LangChain's ReAct agent type implements this pattern. The prompt instructs the LLM to Think (reason about what to do), Act (call a tool with specific inputs), and Observe (read the tool result). The loop continues until the agent decides it has enough information to answer, or maxSteps is reached.
// ============================================// AI Agent Architecture — Core Components (2026 LangChain 0.3+)// ============================================// ---- Component 1: LLM (Reasoning Engine) ----// The LLM decides which tools to call and interprets resultsimport { ChatOpenAI } from'@langchain/openai'const llm = newChatOpenAI({
modelName: 'gpt-4o',
temperature: 0, // Deterministic for tool-calling — reduces hallucination
openAIApiKey: process.env.OPENAI_API_KEY,
})
// ---- Component 2: Tools (External Capabilities) ----// Each tool has a name, description, schema, and execution functionimport { DynamicStructuredTool } from'@langchain/core/tools'import { z } from'zod'const searchKnowledgeBase = newDynamicStructuredTool({
name: 'search_knowledge_base',
description: 'Search the internal knowledge base for answers to customer questions. Use this when the user asks about product features, pricing, or troubleshooting.',
schema: z.object({
query: z.string().describe('The search query — use keywords from the user question'),
category: z.enum(['product', 'pricing', 'troubleshooting', 'general']).optional()
.describe('Filter by category if the topic is clear'),
}),
func: async ({ query, category }) => {
// Search Supabase vector storeconst results = awaitsearchDocuments(query, category)
if (results.length === 0) {
return'No results found. Try a different search query or answer from general knowledge.'
}
return results.map((r) => `[${r.title}]: ${r.content}`).join('\n\n')
},
})
const getOrderStatus = newDynamicStructuredTool({
name: 'get_order_status',
description: 'Look up the status of a customer order by order ID. Returns shipping status, estimated delivery, and tracking number.',
schema: z.object({
orderId: z.string().describe('The order ID — format: ORD-XXXXX'),
}),
func: async ({ orderId }) => {
const order = awaitgetOrderFromDatabase(orderId)
if (!order) {
return `Order ${orderId} not found. Ask the user to verify the order ID.`
}
returnJSON.stringify({
status: order.status,
estimatedDelivery: order.estimatedDelivery,
trackingNumber: order.trackingNumber,
})
},
})
const calculateRefund = newDynamicStructuredTool({
name: 'calculate_refund',
description: 'Calculate the refund amount for a return. Considers order total, return reason, and days since purchase.',
schema: z.object({
orderId: z.string().describe('The order ID'),
returnReason: z.enum(['defective', 'wrong_item', 'changed_mind', 'not_as_described'])
.describe('The reason for the return'),
}),
func: async ({ orderId, returnReason }) => {
const order = awaitgetOrderFromDatabase(orderId)
if (!order) return'Order not found.'const daysSincePurchase = Math.floor(
(Date.now() - newDate(order.createdAt).getTime()) / (1000 * 60 * 60 * 24)
)
let refundPercentage = 1.0if (returnReason === 'changed_mind' && daysSincePurchase > 30) {
refundPercentage = 0.0
} elseif (returnReason === 'changed_mind') {
refundPercentage = 0.85
}
const refundAmount = order.total * refundPercentage
returnJSON.stringify({ refundAmount, refundPercentage, daysSincePurchase })
},
})
// ---- Component 3: Memory (Conversation History) ----// Stored in Supabase PostgreSQL for persistence across requests// Custom history class to match your token-tracking schemaimport { SupabaseChatMessageHistory } from'@langchain/community/stores/message/supabase'functioncreateMemory(conversationId: string) {
returnnewSupabaseChatMessageHistory({
supabaseClient: supabase,
tableName: 'messages',
sessionId: conversationId,
})
}
// ---- Component 4: Agent (2026 pattern) ----// Use createToolCallingAgent + RunnableWithMessageHistoryimport { createToolCallingAgent } from'langchain/agents'import { RunnableWithMessageHistory } from'@langchain/core/runnables'import { ChatPromptTemplate, MessagesPlaceholder } from'@langchain/core/prompts'const prompt = ChatPromptTemplate.fromMessages([
['system', `You are a helpful customer support agent forAcmeCorp.
You have access to the following tools:
- search_knowledge_base: Search the internal knowledge base
- get_order_status: Look up order status by order ID
- calculate_refund: Calculate refund amounts for returns
Rules:
1. Always search the knowledge base before answering product questions
2. If a tool returns no new information, answer from what you know
3. Never make up order statuses — always use get_order_status
4. If you cannot answer, say so and offer to connect with a human agent
5. Maximum10 tool calls per response — synthesize your answer after that`],
newMessagesPlaceholder('chat_history'),
['human', '{input}'],
newMessagesPlaceholder('agent_scratchpad'),
])
asyncfunctioncreateAgent(conversationId: string) {
const tools = [searchKnowledgeBase, getOrderStatus, calculateRefund]
const agent = awaitcreateToolCallingAgent({
llm,
tools,
prompt,
})
const agentWithHistory = newRunnableWithMessageHistory({
runnable: agent,
getMessageHistory: () => createMemory(conversationId),
inputMessagesKey: 'input',
historyMessagesKey: 'chat_history',
})
return agentWithHistory
}
The Agent Loop: Think, Act, Observe
Think: the LLM reads the conversation history and tool descriptions, decides which tool to call
Act: the agent calls the tool with specific inputs parsed from the LLM output
Observe: the tool result is appended to the conversation, and the LLM decides the next step
Each step is a separate LLM call — latency and cost scale with the number of iterations
maxSteps stops the loop — without it, the agent can run indefinitely and drain API credits
Production Insight
Each agent step is a separate LLM call — 10 iterations means 10 API calls per user message.
maxSteps is the safety net — without it, agents loop infinitely and drain API credits.
Rule: set maxSteps to 10 for production, add cost monitoring per conversation.
Key Takeaway
An agent decides tool sequences dynamically — unlike chains which follow fixed steps.
createToolCallingAgent + RunnableWithMessageHistory orchestrates the loop — maxSteps prevents infinite loops and cost overruns.
Each iteration is a separate LLM call — cost and latency scale with step count.
Supabase: Memory, Vector Search, and Conversation Storage
Supabase serves three roles in the agent architecture. PostgreSQL stores conversation history — every message (user and assistant) is persisted for session continuity. pgvector stores document embeddings for semantic search — the agent can retrieve relevant knowledge base entries. The Supabase client provides real-time subscriptions if you want to show agent activity to the user in real time.
The conversation storage pattern: each conversation has a unique ID. Messages are appended to a messages table with the conversation_id as a foreign key. When the agent processes a new message, it loads the conversation history from Supabase and prepends it to the LLM context. This gives the agent memory across turns without storing state in the application server.
Vector search uses pgvector's cosine similarity. Documents are chunked, embedded with OpenAI's text-embedding-3-small model, and stored with their embedding vectors. The match_documents RPC function performs nearest-neighbor search and returns the top-k results above a similarity threshold (use 0.78–0.82 for text-embedding-3-small in production).
io.thecodeforge.ai-agent.supabase-schema.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
-- ============================================-- Supabase Schema for AI Agent-- ============================================-- Enable pgvector extensionCREATEEXTENSIONIFNOTEXISTS vector;
-- ---- Conversations Table ------ Each conversation has a unique ID and tracks the userCREATETABLE public.conversations (
id UUIDDEFAULTgen_random_uuid() PRIMARYKEY,
user_id UUIDNOTNULLREFERENCES auth.users(id) ONDELETECASCADE,
title TEXT,
total_tokens INTEGERDEFAULT0,
created_at TIMESTAMPTZDEFAULTNOW(),
updated_at TIMESTAMPTZDEFAULTNOW()
);
ALTERTABLE public.conversations ENABLEROWLEVELSECURITY;
CREATEPOLICY"Users can manage their own conversations"ON public.conversations
FORALLUSING (auth.uid() = user_id)
WITHCHECK (auth.uid() = user_id);
-- ---- Messages Table ------ Stores each message in a conversationCREATETABLE public.messages (
id UUIDDEFAULTgen_random_uuid() PRIMARYKEY,
conversation_id UUIDNOTNULLREFERENCES public.conversations(id) ONDELETECASCADE,
role TEXTNOTNULLCHECK (role IN ('user', 'assistant', 'system', 'tool')),
content TEXTNOTNULL,
tool_calls JSONB, -- Stores tool call details for assistant messages
tool_call_id TEXT, -- Links tool response to the original call
token_count INTEGERDEFAULT0,
created_at TIMESTAMPTZDEFAULTNOW()
);
ALTERTABLE public.messages ENABLEROWLEVELSECURITY;
CREATEPOLICY"Users can manage messages in their conversations"ON public.messages
FORALLUSING (
conversation_id IN (
SELECT id FROM public.conversations WHERE user_id = auth.uid()
)
)
WITHCHECK (
conversation_id IN (
SELECT id FROM public.conversations WHERE user_id = auth.uid()
)
);
-- Index for fast conversation history retrievalCREATEINDEX idx_messages_conversation_created
ON public.messages (conversation_id, created_at ASC);
-- ---- Documents Table ------ Knowledge base with vector embeddingsCREATETABLE public.documents (
id UUIDDEFAULTgen_random_uuid() PRIMARYKEY,
title TEXTNOTNULL,
content TEXTNOTNULL,
category TEXT, -- product, pricing, troubleshooting, general
embedding vector(1536), -- text-embedding-3-small produces 1536-dim vectors
metadata JSONBDEFAULT'{}',
created_at TIMESTAMPTZDEFAULTNOW()
);
ALTERTABLE public.documents ENABLEROWLEVELSECURITY;
CREATEPOLICY"Documents are readable by authenticated users"ON public.documents
FORSELECTUSING (auth.role() = 'authenticated');
-- Index for vector similarity searchCREATEINDEXON public.documents
USINGivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- ---- Vector Search RPC Function ------ Performs nearest-neighbor search using cosine similarityCREATEORREPLACEFUNCTIONmatch_documents(
query_embedding vector(1536),
match_count INTEGERDEFAULT5,
match_threshold FLOATDEFAULT0.8, -- Production value for text-embedding-3-small
filter_category TEXTDEFAULTNULL
)
RETURNSTABLE (
id UUID,
title TEXT,
content TEXT,
category TEXT,
similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGINRETURNQUERYSELECT
documents.id,
documents.title,
documents.content,
documents.category,
1 - (documents.embedding <=> query_embedding) AS similarity
FROM public.documents
WHERE
(filter_category ISNULLOR documents.category = filter_category)
AND1 - (documents.embedding <=> query_embedding) > match_threshold
ORDERBY documents.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
-- ---- Conversation Summaries Table ------ Stores summarized history for long conversationsCREATETABLE public.conversation_summaries (
id UUIDDEFAULTgen_random_uuid() PRIMARYKEY,
conversation_id UUIDNOTNULLREFERENCES public.conversations(id) ONDELETECASCADE,
summary TEXTNOTNULL,
message_count INTEGERNOTNULL, -- Number of messages summarized
token_count INTEGERNOTNULL, -- Token count of the original messages
created_at TIMESTAMPTZDEFAULTNOW()
);
-- ---- Updated At Trigger ----CREATEORREPLACEFUNCTIONupdate_updated_at()
RETURNSTRIGGERLANGUAGE plpgsql
AS $$
BEGINNEW.updated_at = NOW();
RETURNNEW;
END;
$$;
CREATETRIGGER conversations_updated_at
BEFOREUPDATEON public.conversations
FOREACHROWEXECUTEFUNCTIONupdate_updated_at();
Supabase as the Agent's Memory Layer
PostgreSQL stores conversation history — every message persisted for session continuity
Summarize long conversations to control token costs — store summaries in a separate table.
Next.js Integration: Streaming API and Server Actions
The Next.js integration has two concerns: the API layer that processes agent requests, and the streaming layer that delivers token-by-token responses to the client. The Vercel AI SDK (ai package) handles both — it provides useChat for the client and streamText/StreamingTextResponse for the server.
The API route receives the user's message, loads conversation history from Supabase, runs the agent, and streams the response. The client uses useChat to manage the message list and display streaming tokens. The key pattern: the Route Handler returns streamText(...).toDataStreamResponse(), which is a streaming response that the client consumes incrementally.
For non-streaming use cases (background processing, webhook-triggered agents), use Server Actions instead. They run the agent synchronously and return the result. Server Actions are simpler but do not support streaming — the user waits for the full response.
Streaming (Route Handler + streamText): real-time user-facing chat — user sees tokens as they generate
Server Actions: background processing, webhook-triggered agents, batch operations — no streaming
Route Handlers must return streamText().toDataStreamResponse() — not a plain Response or JSON
useChat manages the message list and streaming state — do not manually manage messages
Server Actions cannot stream — the user waits for the full response before seeing anything
Production Insight
Route Handlers with streamText().toDataStreamResponse() enable real-time token streaming to the client.
Server Actions are simpler but cannot stream — use them for background processing only.
Rule: use Route Handlers for user-facing chat, Server Actions for background agent tasks.
Key Takeaway
Streaming requires Route Handler + streamText() + useChat on the client.
Server Actions run agents synchronously — no streaming, simpler but slower perceived performance.
Store every message in Supabase — conversation history enables session continuity.
Tools: Building and Validating Agent Capabilities
Tools are the agent's external capabilities — each tool is a function with a name, description, and input schema. The LLM reads the tool descriptions to decide which tool to call, and parses the input schema to generate valid arguments. The quality of the description directly affects the agent's ability to use the tool correctly.
The critical pattern: validate tool inputs with Zod before execution. LLMs can hallucinate invalid inputs — missing required fields, wrong types, or malformed queries. Zod validation catches these before the tool executes, returning a clear error message that the agent can learn from.
Tool design principles: descriptions should be specific (not just "search the database"), input schemas should use .describe() on every field (the LLM reads these descriptions), and error messages should be actionable (tell the agent what went wrong and how to fix it).
io.thecodeforge.ai-agent.tools.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
// ============================================// Tool Design Patterns// ============================================import { DynamicStructuredTool } from'@langchain/core/tools'import { z } from'zod'// ---- Good Tool: Clear description, validated inputs, actionable errors + deduplication ----exportconst searchKnowledgeBase = newDynamicStructuredTool({
name: 'search_knowledge_base',
description: `Search the internal knowledge base for answers to customer questions. Usethis when the user asks about product features, pricing, or troubleshooting.
DoNOT use this tool for:
- Order status queries (use get_order_status instead)
- Calculations (use calculate tool instead)
- Questions about the current date or time`,
schema: z.object({
query: z.string()
.min(3, 'Query must be at least 3 characters')
.max(200, 'Query must be under 200 characters')
.describe('The search query — use specific keywords from the user question. Avoid full sentences.'),
category: z.enum(['product', 'pricing', 'troubleshooting', 'general'])
.optional()
.describe('Filter by category if the topic is clear from the question'),
maxResults: z.number()
.min(1)
.max(10)
.default(5)
.describe('Maximum number of results to return — use 3 for focused answers, 5 for comprehensive'),
}),
func: async ({ query, category, maxResults }) => {
try {
// Deduplication guardrailconst resultHash = JSON.stringify({ query, category }).slice(0, 100)
if (globalThis.seenResults?.has(resultHash)) {
return"You've already seen this result. Answer from context or ask for clarification."
}
globalThis.seenResults = globalThis.seenResults || newSet()
globalThis.seenResults.add(resultHash)
// Generate embedding for the queryconst embedding = awaitgenerateEmbedding(query)
// Search Supabase vector storeconst { data, error } = await supabase.rpc('match_documents', {
query_embedding: embedding,
match_count: maxResults,
match_threshold: 0.8, // Production value
filter_category: category ?? null,
})
if (error) {
return `Search failed: ${error.message}. Try a simpler query or remove the category filter.`
}
if (!data || data.length === 0) {
return `No results found for"${query}". Try:
1. Using different keywords
2. Removing the category filter
3. Broadening the search terms`
}
return data.map((doc: any, i: number) =>
`[Result ${i + 1}] (similarity: ${doc.similarity.toFixed(2)})
Title: ${doc.title}
Content: ${doc.content.slice(0, 500)}...`
).join('\n\n')
} catch (err) {
return `Search error: ${err instanceofError ? err.message : 'Unknown error'}. The user should try again or contact support.`
}
},
})
// ---- Good Tool: Input validation with meaningful error messages ----exportconst createTicket = newDynamicStructuredTool({
name: 'create_support_ticket',
description: 'Create a support ticket when the agent cannot resolve the issue. Use this as a last resort after exhausting available tools.',
schema: z.object({
subject: z.string()
.min(10, 'Subject must be at least 10 characters — provide a clear summary')
.max(200)
.describe('Brief subject line summarizing the issue'),
description: z.string()
.min(50, 'Description must be at least 50 characters — include what was tried and what failed')
.max(2000)
.describe('Detailed description including what the user tried and what went wrong'),
priority: z.enum(['low', 'medium', 'high', 'urgent'])
.describe('Priority based on impact: low = question, medium = minor issue, high = blocked, urgent = production down'),
category: z.enum(['billing', 'technical', 'account', 'feature_request'])
.describe('Category of the issue'),
}),
func: async ({ subject, description, priority, category }) => {
// Validate business rulesif (priority === 'urgent' && category !== 'technical') {
return'Urgent priority is only available for technical issues. Please use high priority instead.'
}
const ticket = awaitcreateTicketInDatabase({ subject, description, priority, category })
return `Ticket created successfully.
TicketID: ${ticket.id}
Priority: ${priority}
Expected response time: ${getResponseTime(priority)}
Tell the user: "I've created a support ticket (ID: ${ticket.id}) for you. A team member will respond within ${getResponseTime(priority)}."`
},
})
// ---- Helper: Generate embedding ----asyncfunctiongenerateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
})
return response.data[0].embedding
}
functiongetResponseTime(priority: string): string {
const times: Record<string, string> = {
low: '48 hours',
medium: '24 hours',
high: '4 hours',
urgent: '1 hour',
}
return times[priority] ?? '24 hours'
}
Tool Design Best Practices
Descriptions must explain WHEN to use the tool — not just what it does
Every input field needs .describe() — the LLM reads these to generate valid arguments
Validate inputs with Zod before execution — LLMs can hallucinate invalid inputs
Error messages must be actionable — tell the agent what went wrong and how to retry
Return formatted text, not raw JSON — LLMs parse natural language better than JSON
Production Insight
Tool descriptions are the LLM's instructions — vague descriptions cause wrong tool selection.
Validate inputs with Zod — LLMs hallucinate invalid arguments, especially in multi-step workflows.
Error messages must guide the agent — tell it what failed and how to retry with different inputs.
Cost Control: Token Management and Conversation Summarization
Token costs are the primary production concern for AI agents. Each agent iteration is a separate LLM call that includes the full conversation history, tool descriptions, and the agent's reasoning. A 20-turn conversation with 10 tool calls per turn can consume 100,000+ tokens — costing $1-10 depending on the model.
Three strategies control costs: history truncation (keep only the last N messages), conversation summarization (replace old messages with a summary), and model selection (use gpt-4o-mini for simple tasks, gpt-4o for complex reasoning). Summarization is the most effective — it preserves context while reducing token count by 80-90%.
// ============================================// Cost Control: Token Management// ============================================import { ChatOpenAI } from'@langchain/openai'import { createClient } from'@/lib/supabase/server'// ---- Strategy 1: Conversation Summarization ----// Summarize old messages when the conversation gets too long// Reduces token count by 80-90% while preserving contextexportasyncfunctiongetConversationHistory(conversationId: string, maxTokens: number = 8000) {
const supabase = awaitcreateClient()
// Get the latest summary (if any)const { data: summary } = await supabase
.from('conversation_summaries')
.select('summary, message_count')
.eq('conversation_id', conversationId)
.order('created_at', { ascending: false })
.limit(1)
.single()
// Get messages after the summaryconst offset = summary?.message_count ?? 0const { data: recentMessages } = await supabase
.from('messages')
.select('role, content, token_count')
.eq('conversation_id', conversationId)
.order('created_at', { ascending: true })
.range(offset, offset + 49)
if (!recentMessages) return []
// Build the historyconst history: { role: string; content: string }[] = []
// Add summary as a system message if it existsif (summary) {
history.push({
role: 'system',
content: `[Previous conversation summary]: ${summary.summary}`,
})
}
// Add recent messageslet totalTokens = summary ? estimateTokens(summary.summary) : 0for (const msg of recentMessages) {
if (totalTokens + (msg.token_count ?? 0) > maxTokens) {
break // Stop adding messages when we hit the token limit
}
history.push({ role: msg.role, content: msg.content })
totalTokens += msg.token_count ?? 0
}
return history
}
// ---- Strategy 2: Model Selection Based on Task Complexity ----// Use cheaper models for simple tasks, expensive models for complex reasoningexportfunctionselectModel(taskComplexity: 'simple' | 'moderate' | 'complex') {
const models = {
simple: newChatOpenAI({
modelName: 'gpt-4o-mini',
temperature: 0,
maxTokens: 500, // Limit output tokens for simple responses
}),
moderate: newChatOpenAI({
modelName: 'gpt-4o-mini',
temperature: 0,
maxTokens: 1000,
}),
complex: newChatOpenAI({
modelName: 'gpt-4o',
temperature: 0,
maxTokens: 2000,
}),
}
return models[taskComplexity]
}
// ---- Strategy 3: Token Budget Per Conversation ----// Track and enforce token limits per conversationexportasyncfunctioncheckTokenBudget(conversationId: string): Promise<boolean> {
const supabase = awaitcreateClient()
const MAX_TOKENS_PER_CONVERSATION = 50000const { data: conversation } = await supabase
.from('conversations')
.select('total_tokens')
.eq('id', conversationId)
.single()
if (!conversation) returntrueif (conversation.total_tokens >= MAX_TOKENS_PER_CONVERSATION) {
// Summarize and continueawaitsummarizeConversation(conversationId)
returntrue
}
returntrue
}
// ---- Cost Estimation ----// Estimate cost before running the agentconst MODEL_COSTS: Record<string, { input: number; output: number }> = {
'gpt-4o': { input: 2.50 / 1_000_000, output: 10.00 / 1_000_000 },
'gpt-4o-mini': { input: 0.15 / 1_000_000, output: 0.60 / 1_000_000 },
'text-embedding-3-small': { input: 0.02 / 1_000_000, output: 0 },
}
exportfunctionestimateCost(
modelName: string,
inputTokens: number,
outputTokens: number
): number {
const costs = MODEL_COSTS[modelName] ?? MODEL_COSTS['gpt-4o']
return (inputTokens * costs.input) + (outputTokens * costs.output)
}
// ---- Usage Tracking ----// Store token usage per conversation for billing and monitoringexportasyncfunctiontrackUsage(
conversationId: string,
modelName: string,
inputTokens: number,
outputTokens: number
) {
const supabase = awaitcreateClient()
const cost = estimateCost(modelName, inputTokens, outputTokens)
await supabase.from('usage_logs').insert({
conversation_id: conversationId,
model: modelName,
input_tokens: inputTokens,
output_tokens: outputTokens,
cost,
})
// Update conversation totalawait supabase
.from('conversations')
.update({
total_tokens: supabase.rpc('increment_conversation_tokens', {
conv_id: conversationId,
tokens: inputTokens + outputTokens,
}),
})
.eq('id', conversationId)
}
Token Costs Scale with Conversation Length
Each agent iteration sends the full conversation history to the LLM — costs compound
A 20-turn conversation with 10 tool calls = 200 LLM calls = 100,000+ tokens
Summarization reduces token count by 80-90% while preserving context
Use gpt-4o-mini for simple tasks ($0.15/M input tokens) vs gpt-4o ($2.50/M input tokens)
Track per-conversation token usage in Supabase — set hard limits and alert on budget exceedance
Production Insight
Each agent iteration includes the full conversation history — costs compound with every turn.
Summarization reduces token count by 80-90% while preserving key context.
Rule: summarize after 8,000 tokens, use gpt-4o-mini for simple tasks, track per-conversation costs.
Key Takeaway
Token costs are the primary production concern — each iteration sends full history to the LLM.
Summarization, model selection, and token budgets control spend — implement all three.
Track per-conversation usage in Supabase — set hard limits and alert on exceedance.
Deployment: Environment, Monitoring, and Failure Handling
Production deployment requires three additions beyond the development setup: environment variable management (API keys, Supabase credentials), monitoring (token usage, error rates, latency), and failure handling (tool errors, LLM timeouts, rate limits).
The deployment target matters: Vercel has a 10-second timeout for Serverless Functions (300 seconds on Pro plan). Agent loops with multiple tool calls can exceed this. For long-running agents, use Vercel's Edge Runtime (no timeout limit) or deploy the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service).
io.thecodeforge.ai-agent.deployment.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
// ============================================// Deployment Configuration// ============================================// ---- Environment Variables (.env.local) ----// NEVER commit API keys to the repository// OPENAI_API_KEY=sk-...// SUPABASE_URL=https://your-project.supabase.co// SUPABASE_ANON_KEY=eyJ...// SUPABASE_SERVICE_ROLE_KEY=eyJ... (for server-side only)// NODE_ENV=production// ---- next.config.ts ----// Configure for AI workloadsimporttype { NextConfig } from'next'const nextConfig: NextConfig = {
// Increase body size limit for conversation history
experimental: {
serverActions: {
bodySizeLimit: '2mb',
},
},
// Configure for Vercel deployment
serverExternalPackages: ['@langchain/openai', '@langchain/core'],
}
exportdefault nextConfig
// ---- Rate Limiting (Upstash Redis — works in serverless) ----// In-memory Map dies in serverless — use Redisimport { Ratelimit } from'@upstash/ratelimit'import { Redis } from'@upstash/redis'const redis = newRedis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN! })
const ratelimit = newRatelimit({
redis,
limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
})
exportasyncfunctionrateLimitMiddleware(request: NextRequest) {
if (request.nextUrl.pathname.startsWith('/api/chat')) {
const ip = request.headers.get('x-forwarded-for') ?? 'unknown'const { success } = await ratelimit.limit(ip)
if (!success) {
returnNextResponse.json(
{ error: 'Rate limit exceeded. Please wait before sending another message.' },
{ status: 429 }
)
}
}
returnNextResponse.next()
}
// ---- Middleware: Rate Limiting ----// Prevent abuse by limiting requests per userexportconst config = {
matcher: ['/api/:path*'],
}
// ---- Error Handling Wrapper ----// Catch and log agent errors for observabilityexportasyncfunctionrunAgentWithErrorHandling(
conversationId: string,
input: string
) {
const startTime = Date.now()
try {
const agent = awaitcreateAgent(conversationId)
const result = await agent.invoke({ input })
// Log successawaitlogAgentRun({
conversationId,
status: 'success',
duration: Date.now() - startTime,
steps: result.intermediateSteps?.length ?? 0,
input,
output: result.output,
})
return result
} catch (error) {
// Log failureawaitlogAgentRun({
conversationId,
status: 'error',
duration: Date.now() - startTime,
error: error instanceofError ? error.message : 'Unknown error',
input,
})
// Return user-friendly errorif (error instanceofError && error.message.includes('timeout')) {
return {
output: 'The request took too long to process. Please try a simpler question or try again later.',
}
}
if (error instanceofError && error.message.includes('rate_limit')) {
return {
output: 'The AI service is currently experiencing high demand. Please wait a moment and try again.',
}
}
return {
output: 'I encountered an error processing your request. A support ticket has been created automatically.',
}
}
}
asyncfunctionlogAgentRun(data: Record<string, unknown>) {
const supabase = awaitcreateClient()
await supabase.from('agent_runs').insert(data)
}
// ---- Health Check Endpoint ----// File: app/api/health/route.tsexportasyncfunctionGET() {
const checks = {
openai: false,
supabase: false,
}
// Check OpenAItry {
const response = await fetch('https://api.openai.com/v1/models', {
headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
})
checks.openai = response.ok
} catch {
checks.openai = false
}
// Check Supabasetry {
const supabase = awaitcreateClient()
const { error } = await supabase.from('conversations').select('id').limit(1)
checks.supabase = !error
} catch {
checks.supabase = false
}
const healthy = Object.values(checks).every(Boolean)
returnResponse.json(
{ status: healthy ? 'healthy' : 'degraded', checks },
{ status: healthy ? 200 : 503 }
)
}
Deployment Considerations
Vercel Serverless Functions have a 10-second timeout (300s on Pro) — agent loops may exceed this
Use Edge Runtime for long-running agents — no timeout limit, but no Node.js APIs
Rate limiting prevents abuse — 10 messages per minute per user is a reasonable default
Health checks verify OpenAI and Supabase connectivity — return 503 if either is down
Log every agent run with duration, steps, and errors — observability is critical for debugging
Production Insight
Vercel Serverless Functions have a 10-second timeout — agent loops with many tools may exceed it.
Rate limiting prevents abuse — 10 messages per minute per user is a reasonable baseline.
Rule: log every agent run with duration and error details — observability is critical for production.
Key Takeaway
Deployment requires rate limiting, health checks, and error handling — not just the agent code.
Vercel timeout limits constrain agent loop length — consider Edge Runtime or separate services.
Log every agent run — duration, steps, errors, and token usage for observability.
What No One Tells You About Prerequisites — And Why Missing One Wastes a Day
You don't need a laundry list of packages. You need a runtime that won't silently fail on streaming. Node 18+ is mandatory. Anything less and LangChain's callback handlers will hang your agent mid-loop with zero error messages.
Your local database matters too. Supabase local development with supabase start is non-negotiable. Prod-like indexes, real-time subscriptions, and row-level security all behave differently in the cloud if you've been testing against a remote project from day one. Burn that hour upfront.
CopilotKit's @copilotkit/react-core expects a context provider wrapping your Next.js app. Forget this and your agent will appear deaf — no errors, just an unresponsive chat window. The docs bury this. Now you know.
Last trap: OpenAI keys with no budget cap. Your agent's first runaway loop costs you $12 before you catch it. Set a hard limit in the OpenAI dashboard. Your wallet will thank you.
PrerequisiteCheck.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — javascript tutorial// Verify environment before you write a single agentconst REQUIRED_NODE_VERSION = 18;
const nodeMajor = parseInt(process.version.slice(1).split('.')[0], 10);
if (nodeMajor < REQUIRED_NODE_VERSION) {
console.error(`FATAL: Node ${process.version} detected. Need >= ${REQUIRED_NODE_VERSION}.`);
process.exit(1);
}
// Confirm Supabase local chain is aliveimport { createClient } from'@supabase/supabase-js';
const localSupabase = createClient(
'http://localhost:54321',
process.env.SUPABASE_LOCAL_SERVICE_ROLE_KEY
);
const { data, error } = await localSupabase.from('conversations').select('count').single();
if (error) {
console.error('Supabase local container not responding. Run `supabase start` first.');
process.exit(1);
}
console.log(`Env validated. Node ${process.version}, Supabase live. Building agent.`);
Output
Env validated. Node 20.11.0, Supabase live. Building agent.
Production Trap:
Never source your Supabase URL from a .env in CI. Use supabase link and supabase secrets instead. Hardcoded project refs produce silent 500s when your staging Supabase project restarts overnight.
Key Takeaway
Prerequisites are not a checklist — they are a gating process. Test every dependency before you write one line of agent logic.
The Dependency Install That Breaks — And How to Preempt It
npm install @copilotkit/react-core @copilotkit/runtime langchain @langchain/openai supabase looks innocent. But LangChain's peer dependency on zod will fight CopilotKit's version every time. You end up with two zod instances in your bundle, and your tool schemas silently pass anything — including malformed JSON.
Use npm ls zod after install. If you see more than one version, run npm dedupe and pray. Better yet: use a .npmrc with legacy-peer-deps=true for LangChain's ecosystem. It's ugly. It works.
CopilotKit's runtime package requires openai as a peer. LangChain uses its own wrapper. Import order matters. Pull in @langchain/openai before @copilotkit/runtime. If you reverse them, you get cryptic errors about missing ChatOpenAIconstructors.
One more footgun: Supabase's @supabase/ssr package for Next.js App Router. Your server actions will drop session cookies unless you use the createServerClient factory exactly as their docs show. Copy-paste it. Don't reimplement.
DependencyCheck.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — javascript tutorial// Run this after install to catch version fightsimport { execSync } from'child_process';
try {
const zodVersions = execSync('npm ls zod --depth=0 --json', { encoding: 'utf8' });
const parsed = JSON.parse(zodVersions);
const deps = parsed.dependencies || {};
const uniqueVersions = newSet(Object.values(deps).map(d => d.version));
if (uniqueVersions.size > 1) {
console.warn('WARN: Multiple zod versions detected. Run `npm dedupe`.');
console.table(deps);
} else {
console.log('OK: Single zod version in tree.');
}
} catch {
console.log('zod not installed yet — clean slate.');
}
Output
WARN: Multiple zod versions detected. Run `npm dedupe`.
┌─────────┬─────────┐
│ (index) │ version │
├─────────┼─────────┤
│ langchain │ 3.22.4 │
│ copilotkit │ 3.21.0 │
└─────────┴─────────┘
Senior Shortcut:
Lock LangChain to @langchain/core@0.1.50 — anything newer breaks CopilotKit's runtime. Hardcode this in your package.json resolutions field. You'll thank me when the next major ships.
Key Takeaway
Package resolution in the LangChain + CopilotKit + Supabase stack is a bug farm. Validate your tree, pin versions, and import in the right order.
Why Your Agent Can't See Supabase Data — And How to Fix It
You wired the database. The agent calls a tool named searchBlogs. The tool returns nothing. You check Supabase — rows are there. The problem isn't your SQL. It's the invisible wall between CopilotKit's frontend runtime and your Supabase client.
CopilotKit agents run on the server. When you pass supabase as a client to your tool function, you're passing a browser-side client. It can't see rows protected by Row Level Security unless you forward the authenticated session. The agent gets zero results. No error. Just silence.
Solution: create a server-side Supabase client inside your tool handler using the user's JWT from the request context. CopilotKit's CopilotRuntime exposes context.user with the token. Use createServerClient with that token. Every tool call then runs with the user's permissions.
Second trap: vector search queries return null metadata when your match_documents function doesn't select the right columns. Always explicitly select metadata -> 'title' in your Postgres function. The agent needs structured data, not a JSON blob.
SupabaseAuthTool.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// io.thecodeforge — javascript tutorial// Server-side tool with authenticated Supabase accessimport { createServerClient } from'@supabase/ssr';
import { tool } from'@langchain/core/tools';
import { z } from'zod';
exportconst searchUserBlogs = tool(
async ({ query, userToken }) => {
const supabase = createServerClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!,
{
cookies: {
get: () => null, // Not used here — we use token directly
},
auth: {
autoRefreshToken: false,
detectSessionInUrl: false,
},
}
);
// Impersonate the user with their JWTconst { data, error } = await supabase.auth.setSession({
access_token: userToken,
refresh_token: '',
});
if (error || !data.user) {
returnJSON.stringify({ error: 'Session expired' });
}
const { data: blogs } = await supabase
.from('blogs')
.select('title, content, created_at')
.textSearch('content', query)
.limit(5);
returnJSON.stringify(blogs || []);
},
{
name: 'searchUserBlogs',
description: 'Search the authenticated user\'s blog posts',
schema: z.object({
query: z.string(),
userToken: z.string(),
}),
}
);
Output
[{"title":"Building AI Agents","content":"...","created_at":"2024-03-15T10:00:00Z"}]
Production Trap:
Never use the service_role key here. It bypasses RLS and exposes every user's data. Use the user's JWT + createServerClient to enforce row-level security. Your clients will thank you when the audit comes.
Key Takeaway
Your agent's Supabase queries fail because the auth context is wrong — not the SQL. Always pass the user JWT into tool handlers and create a fresh server client.
The Dependency Install That Breaks — And How to Preempt It
You think npm install langchain is safe? Wrong. The LangChain.js ecosystem is a minefield of breaking updates that nuke your agent loop before it runs. You pull a version that dropped Chain in favor of Runnable without warning — now your imports fail silently, your streaming endpoint crashes, and you waste a day tracing phantom bugs.
The fix: lock your versions before you even think about coding. Use npm install langchain@0.1.x — the stable 0.1 series. Pair it with @langchain/openai@0.0.x and @langchain/community@0.0.x. Forget that, and you'll hit Cannot find module '@langchain/core/dist/runnables' at 2 AM.
Second: test your install with a single agent creation before wiring into Next.js. Run node -e "const { OpenAI } = require('@langchain/openai'); console.log('ok')". If it errors, your package manager is hoisting wrong. Set OVERWRITE to true in your .npmrc to force flat node_modules. This kills the "why can't LangChain see OpenAI" problem dead.
Never use npm update on LangChain packages without reading changelogs. One ^0.2.0 bump and your streaming logic breaks because processResponse was renamed to parseStreamedChunks.
Key Takeaway
Lock LangChain to 0.1.x before a single import. Test install standalone before touching Next.js routes.
Why Your Agent Can't See Supabase Data — And How to Fix It
You built a tool that queries Supabase for user invoices. Your agent returns "I can't find that data" even though the Supabase table is full. Problem: you passed raw SQL to an AI agent that doesn't know your schema. The agent generates hallucinated column names like user_email when your table has email_address. That's not a bug — it's a broken tool design.
Fix: wrap Supabase queries in validated tools that enforce structure before the agent touches them. Use Zod schemas to sanitize inputs and return predictable JSON. For example, your getInvoice tool should accept only userId as a string, query the correct table with typed columns, and return { status: 'found' | 'missing', data: object } every time.
Second: give the agent a schema context. When you define the tool, include a description that lists exact column names: "Find invoices by userId. Table fields: id, email_address, amount, created_at." This stops the agent from inventing its own field names. Test the tool in isolation — call it directly, see the output, then wire it into the agent. If your tool works but the agent fails, your schema description is wrong, not your code.
Always hardcode column names in the tool description. If the agent still hallucinates, prefix your description with 'CRITICAL: Do not change field names.' Works 90% of the time.
Key Takeaway
Your agent doesn't guess Supabase columns — it hallucinates them. Validate schema in the tool, spell out fields in the description.
Why Your LangChain Agent Fails in Production — And How Next.js App Router Fixes It
Most tutorials show agent loops in simple Node scripts. In production, you need request-scoped context, streaming to the UI, and error recovery without dropping user sessions. Next.js App Router provides this through Server Actions that maintain agent state across multiple function calls. The key insight is that your agent loop must be a controlled, resumable process — not a single long-running function. By combining Next.js Server Actions with LangChain's RunnableWithMessageHistory, you create a durable agent that survives page refreshes and network interruptions. Each action call picks up the conversation from Supabase, executes one turn of the agent loop, and streams the response back. This pattern eliminates the common timeout and memory leak issues that plague monolithic agent implementations.
Returns ReadableStream for Next.js streaming response
Production Trap:
Never create a new agent instance per request. Use dependency injection to share the compiled chain across requests — but scope the message history to the user session.
Key Takeaway
Server Actions + RunnableWithMessageHistory = durable, resumable agents.
The Vector Search Index That Wastes Your Credits — And How LangChain + Supabase Fixes It
Every vector search call costs you token credits and latency. Naive implementations re-embed the user query on every turn, often searching irrelevant chunks that burn through your context window. The fix is a two-tier retrieval system using Supabase's native vector extension and LangChain's parent document retriever. First, embed your documents once with OpenAI embeddings and store both chunk-level and document-level vectors in Supabase. Then, at query time, retrieve only relevant document IDs first, then fetch their children chunks. This reduces token consumption by 70% and eliminates irrelevant context injection. Supabase supports this natively with SQL functions that filter by metadata before performing the cosine similarity search. Your agent only sees the top-k relevant chunks, not the entire document base.
Retriever that returns only 5 parent documents (max 10k tokens) instead of 50 raw chunks (20k+ tokens)
Production Trap:
Default Supabase vector search returns raw chunks without deduplication. Add a GROUP BY document_id in your match_documents SQL function to avoid flooding the context window with overlapping content.
Key Takeaway
Parent document retriever cuts token costs 70% by fetching whole documents, not random chunks.
● Production incidentPOST-MORTEMseverity: high
Agent Infinite Loop Cost $4,200 in OpenAI Credits in 3 Hours
Symptom
The billing alert fired at 2 AM — OpenAI API spend had exceeded the daily budget by 84x. The agent logs showed 14,000 tool calls for a single conversation. The agent was stuck in a loop: search_knowledge_base -> result -> "I need more information" -> search_knowledge_base -> same result -> repeat.
Assumption
The AgentExecutor default maxIterations (15) would prevent infinite loops. The team assumed the agent would stop after 15 steps and return a partial answer.
Root cause
The team had set maxIterations to 0 (unlimited) during development to debug a complex multi-step workflow and never changed it back. The agent's prompt did not include a fallback instruction for when tools return no new information. The knowledge base search returned the same 3 results every time, but the agent kept rephrasing the query because the prompt said "search until you find the answer." Without a max iteration limit or a "stop if no new information" instruction, the loop continued indefinitely.
Fix
Set maxSteps to 10 for production. Added a tool result deduplication check — if the same search result is returned twice, the agent is instructed to synthesize an answer from available information instead of searching again. Added a hard cost ceiling via OpenAI usage limits ($100/day). Added per-conversation token tracking in Supabase — conversations exceeding 8,000 tokens trigger automatic summarization. Added an alert for any conversation exceeding 20 tool calls.
Key lesson
Always set maxSteps on the agent — unlimited loops will drain your API budget.
Include fallback instructions in the agent prompt: what to do when tools return no new information.
Set OpenAI usage limits as a safety net — they are the last line of defense against cost overruns.
Track per-conversation token usage — summarize or truncate history when it exceeds your budget threshold.
Production debug guideDiagnose agent loops, tool failures, and memory issues7 entries
Symptom · 01
Agent enters infinite tool-calling loop
→
Fix
Check maxSteps on the agent — set to 10 for production. Add deduplication for tool results.
Symptom · 02
Agent hallucinates tool input parameters
→
Fix
Add Zod schema validation to tool input — reject invalid inputs before the tool executes
Symptom · 03
Conversation history not persisting across requests
→
Fix
Verify Supabase client is writing to the conversations table on each message — check RLS policies
Symptom · 04
Streaming response stops mid-token
→
Fix
Check that the Route Handler returns streamText().toDataStreamResponse() and the client uses useChat from Vercel AI SDK
Symptom · 05
Agent ignores tools and answers directly
→
Fix
Check the agent prompt — ensure tool descriptions are clear and the prompt instructs the agent to use tools
Symptom · 06
Vector search returns irrelevant results
→
Fix
Check embedding model consistency — the same model must be used for indexing and querying. Verify the match_threshold parameter (0.78-0.82 for text-embedding-3-small).
Symptom · 07
Token costs higher than expected
→
Fix
Check conversation history length — long histories multiply token usage. Implement summarization for conversations over 8,000 tokens.
★ AI Agent Quick Debug ReferenceFast commands for diagnosing agent, tool, and memory issues
Agent not calling tools−
Immediate action
Check agent prompt and tool descriptions
Commands
grep -rn 'tool\|Tool\|description' lib/agent/ --include='*.ts' | head -20
cat lib/agent/tools.ts | head -60
Fix now
Verify tools are passed to createToolCallingAgent and each tool has a clear description string
Verify the match_documents RPC function exists and embeddings were generated with the same model
Streaming not working+
Immediate action
Check the Route Handler returns a streaming response
Commands
grep -rn 'streamText\|toDataStreamResponse' app/api/ --include='*.ts' | head -10
curl -s -N http://localhost:3000/api/chat -X POST -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"hello"}]}' | head -5
Fix now
Use streamText from ai package and ensure the client uses useChat hook
Token costs spiking+
Immediate action
Check conversation history length and token usage
Commands
grep -rn 'tokenCount\|token_count\|usage' lib/agent/ --include='*.ts' | head -10
psql "${DATABASE_URL}" -c "SELECT conversation_id, SUM(token_count) as total FROM messages GROUP BY conversation_id ORDER BY total DESC LIMIT 10"
Fix now
Implement conversation summarization when history exceeds 8,000 tokens
Agent Frameworks Compared
Framework
Language
Agent Type
Streaming
Tool Ecosystem
Production Ready
Best For
LangChain
Python, JS/TS
ReAct, Tool Calling
Yes (with streamEvents)
Large
Yes*
Complex multi-tool agents, RAG pipelines
Vercel AI SDK
JS/TS
OpenAI Functions
Yes (native)
Small
Yes
Simple chat agents, streaming-first apps
CrewAI
Python
Role-based multi-agent
Limited
Medium
Growing
Multi-agent collaboration, research tasks
AutoGen
Python
Conversational multi-agent
Yes
Medium
Yes
Multi-agent conversations, code generation
Direct OpenAI API
Any
Function calling
Yes
None (manual)
Yes
Simple single-tool agents, full control
Key takeaways
1
An agent decides tool sequences dynamically
unlike chains which follow fixed steps
2
maxSteps on the agent prevents infinite loops
always set it to 10 in production
3
Supabase stores conversation history and vector embeddings
the database is the agent's memory
4
Conversation summarization reduces token costs by 80-90% while preserving context
5
Tool descriptions guide the LLM's tool selection
write them like instructions, not documentation
6
Validate tool inputs with Zod
LLMs hallucinate invalid arguments in multi-step workflows
Common mistakes to avoid
8 patterns
×
No maxSteps on the agent
Symptom
Agent enters infinite tool-calling loop. Consumes millions of tokens and thousands of dollars in API credits within hours. The agent keeps calling the same tool with slightly different inputs, never reaching a conclusion.
Fix
Set maxSteps to 10 on createToolCallingAgent. Add deduplication logic to detect repeated tool results. Set OpenAI usage limits as a safety net. Monitor per-conversation token counts.
×
Vague tool descriptions that do not explain when to use the tool
Symptom
Agent either never uses the tool (does not know when it is relevant) or uses it incorrectly (applies it to the wrong type of query). Tool selection accuracy drops below 50%.
Fix
Write tool descriptions that explain WHEN to use the tool, not just what it does. Include examples of good queries. List what the tool should NOT be used for. Add .describe() to every Zod schema field.
×
No input validation on tool parameters
Symptom
Tool receives hallucinated inputs from the LLM — missing required fields, wrong types, malformed queries. Tool crashes with unhandled errors, and the agent gets an unclear error message.
Fix
Validate all tool inputs with Zod before execution. Return clear, actionable error messages that tell the agent what went wrong and how to retry with valid inputs.
×
Storing conversation history only in application memory
Symptom
Conversation history is lost when the serverless function terminates. Users lose context on every page refresh. Multi-turn conversations do not work across requests.
Fix
Store conversation history in Supabase PostgreSQL. Load history from the database on every request. Use a custom history class that matches your token-tracking schema.
×
No conversation summarization for long sessions
Symptom
Token costs increase linearly with conversation length. A 50-turn conversation consumes 100,000+ tokens per agent iteration. Monthly API costs exceed budget by 5-10x.
Fix
Implement conversation summarization when history exceeds 8,000 tokens. Use gpt-4o-mini for summarization (cheaper). Replace old messages with a summary in the agent context.
×
Using gpt-4o for all tasks including simple lookups
Symptom
Simple tool calls (order status lookups, basic searches) cost 15x more than necessary. Monthly API costs are dominated by trivial operations that do not require advanced reasoning.
Fix
Use gpt-4o-mini for simple tasks (order lookups, basic searches) and gpt-4o for complex reasoning (multi-tool workflows, refund calculations). Route tasks to the appropriate model based on complexity.
×
No rate limiting on the chat API endpoint
Symptom
A single user or bot sends 1,000 messages per minute, exhausting the OpenAI rate limit and causing errors for all users. API costs spike unexpectedly.
Fix
Add rate limiting middleware using Upstash Redis (sliding window) — 10 messages per minute per user is a reasonable default.
×
Not logging agent runs for observability
Symptom
When the agent produces a wrong answer or enters a loop, there is no way to debug what happened. No visibility into which tools were called, what inputs were used, or how many steps were taken.
Fix
Log every agent run with: conversation ID, input, output, intermediate steps, duration, token count, and error details. Use these logs to identify patterns in agent failures.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
What is the difference between an AI agent and a chain in LangChain?
Q02SENIOR
How do you prevent an AI agent from entering an infinite loop?
Q03SENIOR
How do you manage conversation memory in a stateless serverless environm...
Q04SENIOR
How do you control token costs for an AI agent in production?
Q05JUNIOR
What is the role of Supabase in an AI agent architecture?
Q01 of 05SENIOR
What is the difference between an AI agent and a chain in LangChain?
ANSWER
A chain follows a fixed sequence of steps — retrieve context, format prompt, generate response. The sequence is defined at build time and does not change based on the input.
An agent decides the sequence dynamically. It uses an LLM to reason about which tool to call, calls the tool, observes the result, and decides the next step. The sequence is determined at runtime based on the input and intermediate results.
The trade-off: agents are more flexible (they can handle unexpected queries by choosing different tools) but more expensive (each step is a separate LLM call) and harder to debug (the sequence is non-deterministic). Chains are cheaper, faster, and predictable but limited to predefined workflows.
Q02 of 05SENIOR
How do you prevent an AI agent from entering an infinite loop?
ANSWER
Three layers of protection:
1. maxSteps on the agent — set to 10 for production. This is the hard stop that prevents infinite loops regardless of the agent's behavior.
2. Prompt-level instructions — tell the agent what to do when tools return no new information. Example: "If the search returns the same results twice, synthesize an answer from available information instead of searching again."
3. Tool-level deduplication — detect when the same tool is called with the same inputs and return a message like "This query was already searched. The results have not changed. Please answer based on the available information."
Additionally, set OpenAI usage limits as a financial safety net, and monitor per-conversation token counts to detect loops early.
Q03 of 05SENIOR
How do you manage conversation memory in a stateless serverless environment?
ANSWER
Serverless functions are stateless — they terminate after each request and do not retain memory. To persist conversation history across requests:
1. Store every message in a database (Supabase PostgreSQL). Each message includes the conversation ID, role, content, and timestamp.
2. On each request, load the conversation history from the database and pass it to the agent as context.
3. Use a custom history class that matches your token-tracking schema.
The key insight: the database is the memory, not the application server. Each request loads the memory, processes the message, and writes the result back.
Q04 of 05SENIOR
How do you control token costs for an AI agent in production?
ANSWER
Four strategies:
1. Conversation summarization — when history exceeds 8,000 tokens, summarize old messages with a cheaper model (gpt-4o-mini). This reduces token count by 80-90% while preserving context.
2. Model selection — use gpt-4o-mini for simple tasks (order lookups, basic searches at $0.15/M input tokens) and gpt-4o for complex reasoning ($2.50/M input tokens). Route based on task complexity.
3. Token budgets — set a per-conversation limit (e.g., 50,000 tokens). When exceeded, summarize and continue or stop processing.
4. maxSteps — limit the number of agent steps per turn. Each step is a separate LLM call with the full history. Fewer steps means fewer tokens.
Additionally, track per-conversation token usage in the database, set daily/monthly budgets, and alert when spending exceeds thresholds.
Q05 of 05JUNIOR
What is the role of Supabase in an AI agent architecture?
ANSWER
Supabase serves three roles:
1. Conversation storage — PostgreSQL stores conversation history (messages table) and conversation metadata (conversations table). This enables session persistence across requests in a stateless serverless environment.
2. Vector search — pgvector stores document embeddings for the knowledge base. The match_documents RPC function performs cosine similarity search, enabling the agent to retrieve relevant documents based on semantic meaning rather than keyword matching.
3. Row Level Security — RLS policies ensure users can only access their own conversations. This provides authorization at the database level, independent of the application code.
Additionally, Supabase can store conversation summaries (for cost control), usage logs (for billing and monitoring), and agent run logs (for observability).
01
What is the difference between an AI agent and a chain in LangChain?
SENIOR
02
How do you prevent an AI agent from entering an infinite loop?
SENIOR
03
How do you manage conversation memory in a stateless serverless environment?
SENIOR
04
How do you control token costs for an AI agent in production?
SENIOR
05
What is the role of Supabase in an AI agent architecture?
JUNIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
Can I use a different LLM instead of OpenAI with LangChain?
Yes. LangChain supports many LLM providers: Anthropic (ChatAnthropic), Google (ChatGoogleGenerativeAI), Azure OpenAI (AzureChatOpenAI), Ollama (ChatOllama for local models), and any OpenAI-compatible API. The agent framework works the same regardless of the LLM — only the model initialization changes. For function calling (tool use), ensure the model supports it — OpenAI and Anthropic models have native support, while others may require prompt-based tool calling.
Was this helpful?
02
How do I test an AI agent in my CI/CD pipeline?
Test at three levels: unit test individual tools (mock external calls, verify output format), integration test the agent with a fixed input (verify it calls the expected tools and produces a correct answer), and end-to-end test the API endpoint (verify streaming, authentication, and error handling). For deterministic tests, use temperature=0 and record/replay LLM responses with tools like VCR or Polly. Mock the Supabase client for database operations.
Was this helpful?
03
How do I handle multi-agent architectures where agents delegate to each other?
LangChain supports agent delegation through the agent's tools — one agent can be wrapped as a tool for another agent. The supervisor agent decides which specialist agent to call (e.g., a billing agent, a technical support agent) and passes the relevant context. CrewAI and AutoGen provide higher-level abstractions for multi-agent collaboration with role definitions and conversation patterns.
Was this helpful?
04
What is the difference between gpt-4o and gpt-4o-mini for agent tasks?
gpt-4o is better at complex reasoning, multi-step planning, and understanding nuanced tool descriptions. gpt-4o-mini is faster and 15x cheaper but may struggle with complex tool selection or multi-step workflows. Use gpt-4o-mini for simple tasks (single tool calls, order lookups, basic searches) and gpt-4o for complex tasks (multi-tool workflows, refund calculations). The cost difference is significant: $0.15/M vs $2.50/M input tokens.
Was this helpful?
05
How do I deploy an AI agent that exceeds Vercel's serverless timeout?
Three options: use Vercel Edge Runtime (no timeout limit, but no Node.js APIs — works for pure API calls), move the agent logic to a separate service (AWS Lambda with 15-minute timeout, or a containerized service on ECS/Fargate with no timeout), or implement a queue-based architecture (user submits a message, a worker processes it asynchronously, the client polls or receives a webhook when done). The queue approach is the most scalable but adds complexity.