Building Multi-Agent AI Systems with Next.js and LangGraph
- LangGraph models agent workflows as directed graphs β nodes execute functions, edges route conditionally, state flows between them. If you cannot draw it as a flowchart, you cannot build it as a graph.
- Decompose monolith agents into specialists β each with 2-3 tools and a focused 200-400 token prompt. If an agent's system prompt exceeds 500 tokens, it is doing too much.
- Always set maxIterations on graph compilation β unbounded cycles burn tokens and produce no output. Add revision_count and token_budget to the state for additional guards.
- LangGraph models agent workflows as directed graphs β nodes are functions, edges are conditional routing, state flows between them
- Multi-agent systems split a complex task across specialized agents β researcher, writer, reviewer β each with its own tools and prompts
- Supabase persists agent state across requests β conversation history, tool outputs, and intermediate reasoning survive restarts
- Human-in-the-loop nodes pause the graph and wait for approval before executing high-risk actions (file writes, API calls, deletions)
- Production failure: unbounded graph cycles burn 47,000 tokens when two agents debate endlessly β maxIterations is mandatory
- Biggest mistake: building one mega-agent that does everything β specialized agents with clear boundaries are more reliable and debuggable
Infinite loop between agents
npx langsmith traces --project your-project --limit 5 to list recent tracesCheck the trace timeline for repeated node executions β count the loop iterationsState lost between executions
SELECT * FROM langgraph_checkpoints ORDER BY created_at DESC LIMIT 5 to check recent state savesVerify the state schema matches what the checkpointer serializes β check for custom typesToken budget exceeded mid-execution
grep -rn 'maxTokens\|token' app/api/agent/ to find token configurationCheck LangSmith trace for total token count per executionHuman-in-the-loop node never resumes after approval
SELECT * FROM langgraph_checkpoints WHERE thread_id = '<thread_id>' ORDER BY created_at DESC LIMIT 1 for pending interruptsVerify the resume signal: graph.updateState(config, { approved: true }, 'review_node')Client disconnects but graph keeps running
Check server logs for 'client disconnected' events and orphaned graph executionsImplement cancellation: if (await abortSignal.aborted) { await graph.cancel(config) }Production Incident
Production Debug GuideThe team assumed that the critic agent would
graph.compile() and log the iteration execution count in each node.Send() map-reduce patterns. If you see race conditions, you likely have shared mutable state without proper locking.Single-agent AI systems hit a ceiling fast. One agent with access to 15 tools and a 4,000-token system prompt produces inconsistent results β it confuses tool selection, loses context on long tasks, and cannot self-correct when it makes a mistake. The fix is not more tools or longer prompts. The fix is decomposition: split the task across specialized agents, each with a narrow responsibility and a clear handoff protocol.
LangGraph provides the orchestration layer. It models agent workflows as directed graphs β nodes execute functions (agent calls, tool execution, human review), edges route based on conditional logic (was the output good enough? did the agent request a tool?), and state flows through the graph carrying conversation history, tool outputs, and intermediate results. This graph model enables patterns that single-agent systems cannot achieve: retry loops, parallel execution, human approval gates, and graceful degradation when one agent fails.
The production stack for this article: Next.js 16 as the application framework, LangGraph for agent orchestration, Supabase for state persistence, and the Vercel AI SDK for streaming the graph execution to the client. The patterns apply to any LLM provider β OpenAI, Anthropic, or self-hosted models.
LangGraph Fundamentals: Nodes, Edges, and State
LangGraph models agent workflows as a directed graph. Three primitives compose every graph: nodes, edges, and state. Nodes are functions that execute a step β call an LLM, run a tool, wait for human input, or transform data. Edges define the routing logic β conditional branches based on the output of the previous node. State is the data that flows through the graph β conversation history, tool outputs, intermediate reasoning, and metadata.
The graph is compiled into a runnable that accepts input and produces output. The compilation step validates the graph structure β all nodes are reachable, all edges have valid targets, and the state schema is consistent. Compilation also accepts a checkpointer that persists state after each node execution, enabling pause/resume, human-in-the-loop, and crash recovery.
The key insight: LangGraph is not an agent framework β it is a workflow engine. It does not define how an agent thinks. It defines the order, conditions, and data flow between steps. You bring the agents (functions that call LLMs), the tools (functions that do work), and the routing logic (conditional edges). LangGraph orchestrates them.
State management is the hardest part. The state object must be serializable (for checkpointing), typed (for correctness), and minimal (for token efficiency). Store only what the graph needs to make routing decisions and what agents need as context. Do not dump the entire conversation history into every node β pass only the relevant slice.
import { StateGraph, Annotation, START, END } from '@langchain/langgraph'; import { ChatOpenAI } from '@langchain/openai'; import { HumanMessage, SystemMessage } from '@langchain/core/messages'; import { createClient } from '@supabase/supabase-js'; import { SupabaseSaver } from './checkpointers/supabase'; // Define the graph state β only what the graph needs // Minimal state = fewer tokens = lower cost = faster execution const GraphState = Annotation.Root({ // Input query: Annotation<string>, // Agent outputs research: Annotation<string>, analysis: Annotation<string>, finalReport: Annotation<string>, // Control flow approved: Annotation<boolean>, revisionCount: Annotation<number>, tokenBudget: Annotation<number>, currentStep: Annotation<string>, // Metadata errors: Annotation<string[]>, startTime: Annotation<number>, }); // Node: Research agent β gathers information async function researchNode(state: typeof GraphState.State) { const llm = new ChatOpenAI({ model: 'gpt-4o', temperature: 0 }); const response = await llm.invoke([ new SystemMessage('You are a research agent. Gather relevant information for the query. Be concise and factual.'), new HumanMessage(`Research the following topic: ${state.query}`), ]); return { research: response.content as string, currentStep: 'research_complete', tokenBudget: state.tokenBudget - (response.usage_metadata?.total_tokens ?? 0), }; } // Node: Analysis agent β processes research into insights async function analysisNode(state: typeof GraphState.State) { const llm = new ChatOpenAI({ model: 'gpt-4o', temperature: 0 }); const response = await llm.invoke([ new SystemMessage('You are an analysis agent. Extract key insights from the research. Identify patterns, contradictions, and gaps.'), new HumanMessage(`Analyze this research:\n\n${state.research}`), ]); return { analysis: response.content as string, currentStep: 'analysis_complete', tokenBudget: state.tokenBudget - (response.usage_metadata?.total_tokens ?? 0), }; } // Node: Report writer β synthesizes analysis into a report async function writerNode(state: typeof GraphState.State) { const llm = new ChatOpenAI({ model: 'gpt-4o', temperature: 0.3 }); const response = await llm.invoke([ new SystemMessage('You are a report writer. Synthesize the analysis into a clear, actionable report.'), new HumanMessage(`Write a report based on this analysis:\n\n${state.analysis}`), ]); return { finalReport: response.content as string, currentStep: 'report_complete', revisionCount: state.revisionCount + 1, tokenBudget: state.tokenBudget - (response.usage_metadata?.total_tokens ?? 0), }; } // Conditional edge: route based on approval and budget function shouldContinue(state: typeof GraphState.State): string { // Force-approve if budget exhausted if (state.tokenBudget <= 0) { return 'writer'; } // Force-approve after 3 revisions if (state.revisionCount >= 3) { return 'writer'; } // Route to writer if not approved if (!state.approved) { return 'writer'; } return 'end'; } // Build the graph const graph = new StateGraph(GraphState) .addNode('researcher', researchNode) .addNode('analyzer', analysisNode) .addNode('writer', writerNode) .addEdge(START, 'researcher') .addEdge('researcher', 'analyzer') .addEdge('analyzer', 'writer') .addConditionalEdges('writer', shouldContinue, { writer: 'researcher', end: END, }) .compile({ checkpointer: new SupabaseSaver({ client: createClient( process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_KEY! ), tableName: 'langgraph_checkpoints', }), // Hard cap on total node executions β prevents infinite loops // Note: maxIterations is set at compile time, not runtime }); export { graph, GraphState };
- Nodes are functions β call an LLM, run a tool, wait for human input, or transform data
- Edges are routing logic β conditional branches based on the output of the previous node
- State is the data that flows through the graph β keep it minimal for token efficiency
- Checkpointer persists state after each node β enables pause/resume, human-in-the-loop, crash recovery
- Compilation validates the graph structure β all nodes reachable, all edges valid, state schema consistent
Send() to parallel nodes, fan-in with a reducer nodeMulti-Agent Architecture: Decomposition Over Monoliths
The monolith agent pattern β one agent with 15 tools and a 4,000-token system prompt β fails in production for three reasons. First, tool selection degrades as the tool count increases β the agent confuses similar tools and selects the wrong one. Second, context window pressure β the system prompt, conversation history, and tool descriptions compete for limited context. Third, debugging is opaque β when the monolith produces a bad output, you cannot identify which reasoning step failed.
Multi-agent architecture solves all three problems through decomposition. Each agent has a narrow responsibility: researcher (gathers data), analyzer (extracts insights), writer (produces output), reviewer (validates quality). Each agent has 2-3 tools maximum. Each agent's system prompt is 200-400 tokens focused on one task. The graph orchestrates the handoffs.
The production pattern: define agent boundaries by capability, not by data domain. A 'researcher' agent searches the web, reads documents, and extracts facts β regardless of whether the topic is finance, medicine, or engineering. An 'analyzer' agent identifies patterns, contradictions, and gaps β regardless of the data source. This separation means you can swap the researcher's tools without affecting the analyzer's logic.
The supervisor pattern is the most common multi-agent topology. A supervisor agent receives the user's request, decomposes it into subtasks, routes each subtask to the appropriate specialist agent, and synthesizes the results. The supervisor does not do the work β it orchestrates. This is analogous to a project manager who assigns tasks to engineers, not an engineer who does everything.
import { ChatOpenAI } from '@langchain/openai'; import { SystemMessage, HumanMessage } from '@langchain/core/messages'; import { z } from 'zod'; import { tool } from '@langchain/core/tools'; // Supervisor agent: decomposes tasks and routes to specialists // Does NOT do the work β orchestrates specialists const TaskSchema = z.object({ agent: z.enum(['researcher', 'analyzer', 'writer', 'reviewer']).describe('Which specialist agent should handle this task'), task: z.string().describe('Specific instruction for the specialist agent'), priority: z.enum(['high', 'medium', 'low']).describe('Task priority β high tasks run first'), }); const DecompositionSchema = z.object({ tasks: z.array(TaskSchema).describe('List of tasks to distribute to specialist agents'), reasoning: z.string().describe('Why this decomposition was chosen'), }); export async function supervisorDecompose(query: string) { const llm = new ChatOpenAI({ model: 'gpt-4o', temperature: 0 }); // Bind the structured output schema β forces the LLM to return valid JSON const structuredLlm = llm.withStructuredOutput(DecompositionSchema); const result = await structuredLlm.invoke([ new SystemMessage(`You are a supervisor agent. Your job is to decompose complex tasks into subtasks and assign each to the appropriate specialist.\n\nSpecialists available:\n- researcher: Gathers information from web search, documents, and databases. Best for: factual questions, data collection, source verification.\n- analyzer: Extracts patterns, contradictions, and gaps from provided data. Best for: critical analysis, comparison, summarization.\n- writer: Produces polished output (reports, emails, code). Best for: synthesis, formatting, communication.\n- reviewer: Validates output quality, checks facts, identifies errors. Best for: quality assurance, fact-checking, compliance.\n\nRules:\n- Decompose the query into 2-5 subtasks maximum\n- Each subtask targets exactly one specialist\n- High-priority tasks run first\n- If the query is simple (one-step), assign to a single specialist`), new HumanMessage(`Decompose this task: ${query}`), ]); return result; } // Specialist agent factory β each agent has a narrow tool set and focused prompt export function createSpecialistAgent(role: 'researcher' | 'analyzer' | 'writer' | 'reviewer') { const configs = { researcher: { systemPrompt: 'You are a research agent. You gather information from available sources. You do not analyze or write reports β you collect facts and return them in a structured format.', tools: ['web_search', 'document_reader'], model: 'gpt-4o', temperature: 0, }, analyzer: { systemPrompt: 'You are an analysis agent. You examine provided data and extract key insights, patterns, contradictions, and gaps. You do not gather new data or write final reports.', tools: ['calculator', 'comparison_tool'], model: 'gpt-4o', temperature: 0, }, writer: { systemPrompt: 'You are a writing agent. You synthesize provided analysis into clear, well-structured output. You do not gather data or perform analysis β you write based on what is provided.', tools: ['markdown_formatter'], model: 'gpt-4o', temperature: 0.3, }, reviewer: { systemPrompt: 'You are a review agent. You validate output quality by checking facts, identifying errors, and assessing completeness. You approve or reject with specific feedback. If the output meets 80% of requirements, approve it.', tools: ['fact_checker'], model: 'gpt-4o', temperature: 0, }, }; const config = configs[role]; const llm = new ChatOpenAI({ model: config.model, temperature: config.temperature, }); return { role, invoke: async (task: string, context: string) => { return llm.invoke([ new SystemMessage(config.systemPrompt), new HumanMessage(`Task: ${task}\n\nContext:\n${context}`), ]); }, }; }
State Persistence: Supabase as the Graph Memory Layer
LangGraph's checkpointer interface persists graph state after each node execution. Without a checkpointer, state lives in memory β lost on restart, unavailable for multi-turn conversations, and impossible to debug after the fact. Supabase provides a Postgres-backed checkpointer that survives restarts, supports concurrent access, and enables SQL queries against historical state.
The checkpointer stores three things: the current state snapshot (serialized graph state), the write-ahead log (sequence of state updates), and the metadata (thread_id, node_name, timestamp). The thread_id is the primary key β it groups all state snapshots for a single conversation or workflow execution.
The production pattern: use a dedicated Supabase table for checkpoints with a composite index on (thread_id, created_at). After each node execution, the checkpointer writes the full state snapshot. On resume (human-in-the-loop, crash recovery, or multi-turn conversation), the checkpointer loads the latest snapshot for the thread_id and the graph resumes from that point.
State serialization is the hidden complexity. The graph state may contain complex types β Message objects, tool call results, custom classes. The checkpointer must serialize these to JSON for storage and deserialize them on load. LangChain's message serialization handles Message objects, but custom types need explicit serialization hooks. If serialization fails silently, the restored state is incomplete β agents receive partial context and produce incorrect outputs.
import { BaseCheckpointSaver, Checkpoint, CheckpointMetadata } from '@langchain/langgraph'; import { SupabaseClient } from '@supabase/supabase-js'; interface SupabaseSaverConfig { client: SupabaseClient; tableName?: string; } // Supabase-backed checkpointer for LangGraph // Persists graph state after each node execution β survives restarts export class SupabaseSaver extends BaseCheckpointSaver { private client: SupabaseClient; private tableName: string; constructor(config: SupabaseSaverConfig) { super(); this.client = config.client; this.tableName = config.tableName ?? 'langgraph_checkpoints'; } // Get the latest checkpoint for a thread async getTuple(config: { configurable: { thread_id: string } }) { const { data, error } = await this.client .from(this.tableName) .select('*') .eq('thread_id', config.configurable.thread_id) .order('created_at', { ascending: false }) .limit(1) .single(); if (error || !data) { return undefined; } return { config: { configurable: { thread_id: data.thread_id, checkpoint_id: data.checkpoint_id } }, checkpoint: JSON.parse(data.state) as Checkpoint, metadata: JSON.parse(data.metadata) as CheckpointMetadata, parentConfig: data.parent_checkpoint_id ? { configurable: { thread_id: data.thread_id, checkpoint_id: data.parent_checkpoint_id } } : undefined, }; } // List all checkpoints for a thread β for debugging and audit async *list(config: { configurable: { thread_id: string } }) { const { data, error } = await this.client .from(this.tableName) .select('*') .eq('thread_id', config.configurable.thread_id) .order('created_at', { ascending: false }); if (error || !data) { return; } for (const row of data) { yield { config: { configurable: { thread_id: row.thread_id, checkpoint_id: row.checkpoint_id } }, checkpoint: JSON.parse(row.state) as Checkpoint, metadata: JSON.parse(row.metadata) as CheckpointMetadata, parentConfig: row.parent_checkpoint_id ? { configurable: { thread_id: row.thread_id, checkpoint_id: row.parent_checkpoint_id } } : undefined, }; } } // Save a checkpoint β called after each node execution async put( config: { configurable: { thread_id: string } }, checkpoint: Checkpoint, metadata: CheckpointMetadata, ) { const checkpointId = checkpoint.id ?? crypto.randomUUID(); const { error } = await this.client .from(this.tableName) .upsert({ thread_id: config.configurable.thread_id, checkpoint_id: checkpointId, parent_checkpoint_id: config.configurable.checkpoint_id ?? null, state: JSON.stringify(checkpoint), metadata: JSON.stringify(metadata), created_at: new Date().toISOString(), }); if (error) { console.error('Failed to save checkpoint:', error); throw new Error(`Checkpoint save failed: ${error.message}`); } return { configurable: { thread_id: config.configurable.thread_id, checkpoint_id: checkpointId } }; } }
list() β query all checkpoints for a thread, reconstruct execution historyHuman-in-the-Loop: Approval Gates for High-Risk Actions
Some agent actions are too risky to execute without human review. Deleting files, sending emails, making API calls with side effects, or generating legal documents β these need a human approval gate before execution. LangGraph's interrupt mechanism provides this: the graph pauses at a specific node, saves the state, and waits for a resume signal with the human's decision.
The pattern: the agent proposes an action, the graph interrupts and presents the proposal to the user, the user approves or rejects, and the graph resumes with the decision in the state. The conditional edge after the interrupt node routes based on the approval status β execute if approved, revise if rejected, terminate if the user cancels.
The production consideration: the interrupt-resume cycle must be atomic. The state at the interrupt point must be exactly what the user sees, and the resume must restore that exact state. If the state changes between interrupt and resume (e.g., another process modifies the database), the agent may execute an action based on stale context.
The UX challenge is presenting the proposal clearly. The user needs to understand what the agent wants to do, why, and what the consequences are. A raw JSON dump of the proposed action is not sufficient. The agent should generate a human-readable summary of the proposed action, and the UI should present it with approve/reject buttons and an optional feedback field for rejections.
'use client';\n\nimport { useState } from 'react';\nimport { Button } from '@/components/ui/button';\nimport { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card';\nimport { Textarea } from '@/components/ui/textarea';\n\ninterface ApprovalRequest {\n threadId: string;\n nodeName: string;\n proposal: string;\n riskLevel: 'low' | 'medium' | 'high';\n actionType: string;\n details: Record<string, unknown>;\n}\n\ninterface HumanApprovalProps {\n request: ApprovalRequest;\n onApprove: (threadId: string, feedback?: string) => Promise<void>;\n onReject: (threadId: string, feedback: string) => Promise<void>;\n}\n\n// Human-in-the-loop approval UI\n// The graph pauses at an interrupt node and waits for this component to send a resume signal\nexport function HumanApproval({ request, onApprove, onReject }: HumanApprovalProps) {\n const [feedback, setFeedback] = useState('');\n const [isSubmitting, setIsSubmitting] = useState(false);\n\n const riskColors = {\n low: 'bg-green-500/10 text-green-700 border-green-500/20',\n medium: 'bg-yellow-500/10 text-yellow-700 border-yellow-500/20',\n high: 'bg-red-500/10 text-red-700 border-red-500/20',\n };\n\n const handleApprove = async () => {\n setIsSubmitting(true);\n try {\n await onApprove(request.threadId, feedback || undefined);\n } finally {\n setIsSubmitting(false);\n }\n };\n\n const handleReject = async () => {\n if (!feedback.trim()) {\n return; // Rejection requires feedback β the agent needs to know why\n }\n setIsSubmitting(true);\n try {\n await onReject(request.threadId, feedback);\n } finally {\n setIsSubmitting(false);\n }\n };\n\n return (\n <Card className="border-l-4 border-l-yellow-500">\n <CardHeader>\n <div className="flex items-center justify-between">\n <CardTitle className="text-lg">Approval Required</CardTitle>\n <span className={`rounded-full px-3 py-1 text-xs font-medium border ${riskColors[request.riskLevel]}`}>\n {request.riskLevel.toUpperCase()} RISK\n </span>\n </div>\n <CardDescription>\n The agent wants to perform: <strong>{request.actionType}</strong>\n </CardDescription>\n </CardHeader>\n <CardContent className="space-y-4">\n {/* Human-readable proposal β not raw JSON */}\n <div className="rounded-md bg-muted p-4">\n <p className="text-sm whitespace-pre-wrap">{request.proposal}</p>\n </div>\n\n {/* Feedback field β required for rejection, optional for approval */}\n <div className="space-y-2">\n <label className="text-sm font-medium">Feedback (required for rejection)</label>\n <Textarea\n value={feedback}\n onChange={(e) => setFeedback(e.target.value)}\n placeholder="Explain why you are rejecting or provide additional context..."\n rows={3}\n />\n </div>\n\n {/* Action buttons */}\n <div className="flex gap-3 justify-end">\n <Button\n variant="outline"\n onClick={handleReject}\n disabled={isSubmitting || !feedback.trim()}\n >\n Reject\n </Button>\n <Button\n onClick={handleApprove}\n disabled={isSubmitting}\n >\n Approve\n </Button>\n </div>\n </CardContent>\n </Card>\n );\n}
- Interrupt node pauses the graph and saves the state β the agent's proposal is frozen in time
- Human reviews the proposal in the UI β approve or reject with feedback
- Resume signal carries the decision back to the graph β conditional edge routes based on approval
- The state between interrupt and resume must be atomic β no external modifications
- Rejection feedback goes back to the agent as context β it revises and proposes again
Streaming Graph Execution to the Client
Multi-agent graph execution can take 10-60 seconds β multiple LLM calls, tool executions, and conditional routing add up. Without streaming, the user sees a blank screen for the entire duration. With streaming, the user sees each node's output as it executes: the researcher's findings appear first, then the analyzer's insights, then the writer's report.
LangGraph supports streaming via the graph.stream() method, which yields events as each node completes. Each event contains the node name, the state update, and the metadata. The Next.js Route Handler pipes these events to the client via a ReadableStream, and the client renders them token-by-token.
The production pattern: stream three levels of information. Level 1: node status β which agent is currently executing (show a status indicator: 'Researching...', 'Analyzing...', 'Writing...'). Level 2: node output β the agent's response as it generates (stream tokens from the LLM call). Level 3: graph metadata β iteration count, token usage, and routing decisions (for debugging dashboards).
The UX consideration: do not show raw graph events to users. Transform them into a conversation-like interface where each agent's contribution appears as a message. The user sees a coherent narrative, not a debugging log.
Critical production concern: client disconnections. If the user navigates away or refreshes the page mid-execution, the graph may continue running server-side, consuming tokens without a client to receive the output. Implement AbortSignal handling to cancel graph execution when the client disconnects.
import { NextRequest } from 'next/server';\nimport { graph } from '@/io/thecodeforge/multi-agent/lib/graphs/research-graph';\nimport { HumanMessage } from '@langchain/core/messages';\n\n// Route Handler: streams graph execution events to the client\n// Each node's output appears as it completes β no blank screen\nexport async function POST(req: NextRequest) {\n const { query, threadId } = await req.json();\n\n if (!query || !threadId) {\n return Response.json({ error: 'query and threadId are required' }, { status: 400 });\n }\n\n const encoder = new TextEncoder();\n\n const stream = new ReadableStream({\n async start(controller) {\n try {\n // Stream graph execution β yields events as each node completes\n const graphStream = graph.stream(\n {\n query,\n tokenBudget: 10000,\n revisionCount: 0,\n approved: false,\n errors: [],\n startTime: Date.now(),\n },\n {\n configurable: { thread_id: threadId },\n // Stream mode: 'updates' yields state updates per node\n streamMode: 'updates',\n },\n );\n\n for await (const event of graphStream) {\n // Check if client disconnected\n // Note: In production, pass AbortSignal from the request\n // and check controller.shouldClose() or abortSignal.aborted\n \n // Each event is { nodeName: stateUpdate }\n for (const [nodeName, stateUpdate] of Object.entries(event)) {\n const data = JSON.stringify({\n type: 'node_update',\n node: nodeName,\n state: stateUpdate,\n timestamp: Date.now(),\n });\n\n controller.enqueue(encoder.encode(`data: ${data}\n\n`));\n }\n }\n\n // Stream complete\n controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: 'done' })}\n\n`));\n controller.close();\n } catch (error) {\n const errorMessage = error instanceof Error ? error.message : 'Unknown error';\n controller.enqueue(\n encoder.encode(`data: ${JSON.stringify({ type: 'error', message: errorMessage })}\n\n`)\n );\n controller.close();\n }\n },\n });\n\n return new Response(stream, {\n headers: {\n 'Content-Type': 'text/event-stream',\n 'Cache-Control': 'no-cache',\n 'Connection': 'keep-alive',\n },\n });\n}
Deployment and Observability: LangSmith Tracing in Production
Multi-agent systems are harder to debug than single-agent systems. When a single agent produces bad output, you review one prompt and one response. When a multi-agent graph produces bad output, you must trace the entire execution: which agent was called, in what order, what each agent received, what each agent produced, and where the routing logic sent the output next.
LangSmith provides distributed tracing for LangGraph executions. Each graph run produces a trace with a tree of spans β one span per node, one span per LLM call, one span per tool execution. The trace shows the full execution path, the state at each node, the token usage, and the latency. This is essential for debugging production failures.
The production pattern: enable LangSmith tracing in the graph's configuration. Each trace is tagged with metadata β user_id, thread_id, graph_name, and environment (staging/production). Use the LangSmith dashboard to filter traces by tag, search for specific node outputs, and compare successful runs against failed runs.
The observability budget matters. LangSmith charges per trace. A multi-agent graph with 5 nodes and 2 retry loops produces 10+ spans per execution. At 1,000 daily executions, that is 10,000+ spans per day. Sample traces in production β log 100% in staging, 10% in production, and 100% of error traces.
Cold start considerations for Vercel serverless. Each graph execution is a serverless function invocation. Cold starts add 1-3 seconds to the first node execution. For graphs that exceed the serverless timeout (300 seconds on Pro), use a background job pattern with webhook callbacks.
import { Client } from 'langsmith';\n\n// LangSmith tracing configuration for production multi-agent systems\n// Enable tracing, tag with metadata, sample for cost control\n\nexport function createTracingConfig(options: {\n userId: string;\n threadId: string;\n graphName: string;\n environment: 'staging' | 'production';\n}) {\n const isProduction = options.environment === 'production';\n\n // Sample rate: 100% in staging, 10% in production\n // Error traces are always logged (handled in the graph's error handler)\n const shouldTrace = !isProduction || Math.random() < 0.1;\n\n if (!shouldTrace) {\n return { tracingEnabled: false };\n }\n\n return {\n tracingEnabled: true,\n // LangSmith callbacks are configured via environment variables\n // LANGCHAIN_TRACING_V2=true\n // LANGCHAIN_API_KEY=...\n // LANGCHAIN_PROJECT=your-project-name\n callbacks: [\n // Metadata tags for filtering in the LangSmith dashboard\n {\n handleLLMStart: async (llm: unknown, prompts: string[], runId: string) => {\n // Tags are set at the run level, not per-span\n // Use the LangSmith client to update the run with metadata\n },\n },\n ],\n metadata: {\n user_id: options.userId,\n thread_id: options.threadId,\n graph_name: options.graphName,\n environment: options.environment,\n // Custom tags for filtering\n tags: [\n options.graphName,\n options.environment,\n `user:${options.userId}`,\n ],\n },\n };\n}\n\n// Error trace logger β always logs 100% of errors regardless of sample rate\nexport async function logErrorTrace(\n client: Client,\n error: Error,\n context: {\n userId: string;\n threadId: string;\n graphName: string;\n nodeName: string;\n state: Record<string, unknown>;\n },\n) {\n await client.createRun({\n name: `error:${context.graphName}:${context.nodeName}`,\n runType: 'chain',\n inputs: {\n error: error.message,\n stack: error.stack,\n state: context.state,\n },\n tags: ['error', context.graphName, context.nodeName],\n metadata: {\n user_id: context.userId,\n thread_id: context.threadId,\n graph_name: context.graphName,\n node_name: context.nodeName,\n environment: process.env.NODE_ENV,\n },\n });\n}
Testing Multi-Agent Graphs
Testing multi-agent systems requires a different strategy than single-agent tests. You need to verify the graph structure, state transitions, loop termination, and end-to-end behavior. Three testing layers address different failure modes.
Unit tests: test individual nodes in isolation. Mock the LLM client and verify that the node transforms input state to output state correctly. Use tools like Jest to assert that researchNode returns the expected keys (research, currentStep, tokenBudget) based on a given input state.
Integration tests: test state transitions and routing. Run the graph with a fixed thread_id and verify that conditional edges route correctly. Test loop termination by setting maxIterations to a low value (e.g., 2) and asserting that the graph terminates. Use a test Supabase database with seeded state.
End-to-end tests: test the full user journey. Simulate a user request end-to-end and assert on the final state. Use LangSmith mock clients to record traces for debugging. Verify that the final output contains expected content and that the token budget was not exceeded.
Visual testing: verify graph structure. Use graph.getGraph().drawMermaidPng() to generate a visualization of the graph and assert that it matches the expected topology. This catches structural bugs like missing edges or unreachable nodes.
import { describe, it, expect, beforeEach, vi } from 'vitest';\nimport { graph, GraphState } from './research-graph';\n\n// Mock the LLM to return predictable output\nvi.mock('@langchain/openai', () => ({
ChatOpenAI: vi.fn().mockImplementation(() => ({
invoke: vi.fn().mockResolvedValue({
content: 'Mocked research output',
usage_metadata: { total_tokens: 100 },
}),
})),\n}));\n\ndescribe('Research Graph', () => {\n const threadId = 'test-thread-123';\n\n it('should execute the full graph and produce a final report', async () => {\n const initialState = {\n query: 'What is LangGraph?',\n tokenBudget: 10000,\n revisionCount: 0,\n approved: false,\
errors: [],\n startTime: Date.now(),\n };\n\n const result = await graph.invoke(initialState, {\n configurable: { thread_id: threadId },\n });\n\n expect(result.finalReport).toBeDefined();\n expect(result.currentStep).toBe('report_complete');\n });\n\n it('should terminate after maxIterations to prevent infinite loops', async () => {\n const initialState = {\n query: 'Test query',\n tokenBudget: 10000,\n revisionCount: 0,\n approved: false, // Always triggers revision loop\n errors: [],\n startTime: Date.now(),\n };\n\n // Run with very low maxIterations to test termination\n // In real tests, compile the graph with maxIterations: 3\n // Here we just verify the graph eventually terminates\n let iterations = 0;\n for await (const _ of graph.stream(initialState, {\n configurable: { thread_id: `${threadId}-loop-test` },\n })) {\n iterations++;\n if (iterations > 20) {\n throw new Error('Graph did not terminate β infinite loop detected');\n }\n }\n \n expect(iterations).toBeLessThanOrEqual(20);\n });\n\n it('should persist state to Supabase after each node', async () => {\n // This test requires a test Supabase instance\n // Verify that after each node execution, a checkpoint is created\n const initialState = {\n query: 'State persistence test',\n tokenBudget: 5000,\n revisionCount: 0,\n approved: true, // Skip revision loop\n errors: [],\n startTime: Date.now(),\n };\n\n await graph.invoke(initialState, {\n configurable: { thread_id: `${threadId}-checkpoint-test` },\n });\n\n // Query Supabase to verify checkpoints were saved\n // const checkpoints = await supabase.from('langgraph_checkpoints')...\n // expect(checkpoints.length).toBeGreaterThan(0);\n });\n});
| Aspect | Single Agent | Multi-Agent (LangGraph) |
|---|---|---|
| Tool count | 1-5 tools β manageable selection accuracy | 2-3 tools per agent β each specialist has a narrow tool set |
| System prompt size | 200-500 tokens β focused on one task | 200-400 tokens per agent β focused prompts, total context shared across agents |
| Debugging | One prompt, one response β easy to trace | Distributed trace across nodes β requires LangSmith or equivalent |
| Self-correction | Agent may retry but has no structured revision loop | Reviewer agent + conditional edge enables structured revision cycles |
| Human oversight | Difficult to gate specific actions | Interrupt nodes pause at specific points β targeted approval gates |
| Cost | One LLM call per request | Multiple LLM calls per request β 3-10x token usage |
| Latency | 5-15 seconds for single response | 15-60 seconds for full graph execution |
| Best for | Simple Q&A, single-step tasks, chatbots | Research pipelines, content workflows, multi-step analysis, code generation with review |
π― Key Takeaways
- LangGraph models agent workflows as directed graphs β nodes execute functions, edges route conditionally, state flows between them. If you cannot draw it as a flowchart, you cannot build it as a graph.
- Decompose monolith agents into specialists β each with 2-3 tools and a focused 200-400 token prompt. If an agent's system prompt exceeds 500 tokens, it is doing too much.
- Always set maxIterations on graph compilation β unbounded cycles burn tokens and produce no output. Add revision_count and token_budget to the state for additional guards.
- Supabase checkpointer persists graph state after each node β survives restarts, enables multi-turn conversations, and supports debugging via SQL queries.
- Human-in-the-loop nodes pause the graph for high-risk actions β rejection requires feedback so the agent can revise. If your agent can delete data without approval, you have a liability.
- Stream graph execution in three levels: node status, node output, graph metadata. If your system takes 30 seconds and shows nothing, users assume it is broken.
- Handle client disconnections β use AbortSignal to cancel orphaned executions. Use background jobs for graphs exceeding serverless timeouts.
β Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain the difference between a single-agent system and a multi-agent system. When would you choose one over the other?Mid-levelReveal
- QHow does LangGraph prevent infinite loops in a multi-agent system? Walk me through the safeguards you would implement.SeniorReveal
- QWhat is the role of a checkpointer in LangGraph, and why is Supabase a good choice for production?Mid-levelReveal
- QHow would you implement a human-in-the-loop approval gate in a LangGraph multi-agent system?SeniorReveal
- QWhat is the supervisor pattern in multi-agent systems, and when would you use it over a sequential pipeline?Mid-levelReveal
- QHow do you handle client disconnections during a long-running graph execution?SeniorReveal
Frequently Asked Questions
Can I use LangGraph with Anthropic Claude instead of OpenAI?
Yes. LangGraph is model-agnostic β it orchestrates functions, not specific LLMs. Swap ChatOpenAI for ChatAnthropic in the node functions. The graph structure, state management, and checkpointing work identically. The only difference is the LLM call itself and the response format. LangChain provides unified interfaces for both providers.
How much does a multi-agent system cost compared to a single-agent system?
A multi-agent system typically costs 3-10x more in tokens per request. A single-agent request with GPT-4o costs approximately $0.01-0.03. A multi-agent graph with 4 nodes (supervisor + 3 specialists) costs approximately $0.05-0.15 per request. The cost scales with the number of LLM calls, not the complexity of the task. Mitigate cost by: using gpt-4o-mini for non-critical agents, setting maxTokens per agent, and using conditional routing to skip unnecessary agents.
Do I need LangSmith for production, or can I use other observability tools?
LangSmith is the native tracing tool for LangChain/LangGraph β it provides automatic span creation for every LLM call, tool execution, and graph node. Alternatives exist (Langfuse, OpenLLMetry, custom OpenTelemetry) but require manual instrumentation. LangSmith is recommended for LangGraph projects because the integration is zero-config β set two environment variables and every graph execution is traced automatically.
How do I handle rate limiting across multiple agents that all call the same LLM provider?
Each agent's LLM call counts against the same provider rate limit. A graph with 4 agents makes 4 concurrent calls β if your rate limit is 500 RPM, the graph consumes 4 RPM per execution. Implement application-layer rate limiting with a shared token bucket (Upstash Redis) that tracks all LLM calls from all agents. Add retry logic with Retry-After header parsing on 429 responses. Consider staggering agent execution (sequential instead of parallel) if rate limits are tight.
Can I deploy a LangGraph multi-agent system to Vercel serverless?
Yes, with caveats. Each graph execution is a serverless function invocation. Set maxDuration to 60-300 seconds depending on your plan. The Supabase checkpointer persists state between invocations β the graph can pause (human-in-the-loop) and resume in a separate invocation. Cold starts add 1-3 seconds to the first node execution. For graphs that exceed the serverless timeout, use a background job pattern (Inngest, Qstash) with a webhook callback.
How do I test a multi-agent graph?
Test at three layers: (1) Unit tests β test individual nodes in isolation with mocked LLM clients, verify state transformation. (2) Integration tests β run the graph with a fixed thread_id, test conditional edge routing, verify loop termination with low maxIterations. (3) End-to-end tests β simulate full user journeys, assert on final state and output quality. Use graph.getGraph().drawMermaidPng() for visual testing β assert the topology matches expectations to catch missing edges or unreachable nodes.
What happens if the user disconnects mid-execution?
Without handling, the graph continues running server-side, consuming tokens with no client to receive the output. Implement AbortSignal handling: pass req.signal to graph.stream(), check abortSignal.aborted in the stream loop, and call graph.cancel(config) when aborted. For long-running graphs (>60s), use background jobs (Inngest/Qstash) instead of serverless β trigger via API, receive callback on completion.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.