Core components: A tool has a name (identifier), description (LLM's usage guide), and args_schema (Pydantic model for validation).
Execution flow: The LLM emits a tool_calls request; the ToolExecutor or LangGraph node executes the function and returns the result.
Production value: Transforms a frozen LLM into an agent that can fetch real-time data, perform calculations, and trigger side-effects.
Critical insight: The tool's description is the LLM's primary prompt. A vague or incorrect description is the #1 cause of agent failure.
Performance lever: Modern models support parallel tool calls. Batching requests in a single LLM response drastically reduces end-to-end latency.
Biggest mistake: Treating tools as simple functions. They are API contracts for a non-human agent and require defensive design, validation, and error handling.
Plain-English First
Imagine you hired a brilliant assistant who knows everything from books — but they're locked in a room with no phone, no computer, no way to check today's weather or your calendar. LangChain Tools are the doors you cut into that room. Each door leads somewhere useful: one to Google, one to a calculator, one to your company database. Now your assistant can actually DO things in the real world, not just recite facts from memory.
An LLM in isolation is a reasoning engine with no connection to the live environment. It cannot verify current facts, execute transactions, or interact with proprietary systems. This limitation makes vanilla LLMs unsuitable for most production applications where actions, not just answers, are required.
LangChain Tools provide a structured interface to bridge this gap. They are not merely function wrappers; they are a formal contract between the orchestration layer and the LLM. The contract specifies what action can be taken, when it should be used, and what inputs it requires. The LLM's role is to parse user intent and select the appropriate tool; the framework's role is to execute it safely.
A common misconception is that tools give the LLM direct access to APIs. In reality, the LLM never executes code. It only outputs a structured request. The security boundary remains intact: the orchestration layer (your code) retains full control over execution, validation, and error handling. Understanding this separation is critical for building secure, reliable agents.
How LangChain Tools Actually Work Under the Hood
A LangChain Tool is not magic — it's a Python callable wrapped in a metadata contract. That contract has three mandatory fields: a name (a short snake_case identifier the LLM uses to invoke it), a description (the natural-language prompt that tells the LLM WHEN and WHY to use this tool), and an args_schema (a Pydantic model that enforces what arguments are valid). That's the entire surface area. Everything else is implementation.
When you bind tools to a chat model using .bind_tools(), LangChain serializes those Pydantic schemas into JSON Schema and injects them into the system prompt or into the model's tools parameter (depending on the provider). The LLM sees a list of callable 'functions' in its context window. When it decides to use one, it returns an AIMessage with a tool_calls attribute — a list of structured dicts containing the tool name and arguments. Crucially, the LLM does NOT execute anything. It just declares intent.
The ToolExecutor (or a LangGraph node) picks up those tool_calls, routes each one to the matching Python function, runs it, wraps the result in a ToolMessage, and appends it back to the conversation history. The model then reads that ToolMessage and continues reasoning. This request-execute-observe loop is the entire foundation of ReAct-style agents.
The description field is more important than most developers realize. It IS the tool's API documentation for the LLM. A vague description causes the model to call the wrong tool, call it with wrong arguments, or hallucinate that a tool exists. Treat descriptions like you'd treat a well-written docstring that a new engineer has to act on without asking questions.
tool_internals_demo.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
from langchain_core.tools import tool
from langchain_openai importChatOpenAIfrom pydantic importBaseModel, Fieldimport json
# --- Step 1: Define a strict input schema with Pydantic ---# This schema becomes JSON Schema that gets sent to the LLM.# Field descriptions are part of what the LLM reads to understand how to call this.classStockLookupInput(BaseModel):
ticker: str = Field(
description="The stock ticker symbol in uppercase, e.g. 'AAPL'or'MSFT'"
)
metric: str = Field(
description="The metric to retrieve. One of: 'price', 'pe_ratio', 'market_cap'"
)
# --- Step 2: Decorate with @tool and provide a rich description ---# The description is the LLM's only guide for WHEN to use this.# Be specific: tell it what it returns and when NOT to use it.
@tool(args_schema=StockLookupInput)
defget_stock_metric(ticker: str, metric: str) -> str:
"""
Retrieve a real-time financial metric for a publicly traded stock.
Use this when the user asks about current stock prices, P/E ratios,
or market capitalisation. DoNOT use this for historical data or crypto.
Returns a formatted string with the value and currency where applicable.
"""
# Simulated data store — in production this would call a financial API
mock_data = {
"AAPL": {"price": "$189.43", "pe_ratio": "31.2", "market_cap": "$2.94T"},
"MSFT": {"price": "$415.61", "pe_ratio": "36.8", "market_cap": "$3.08T"},
}
ticker = ticker.upper()
if ticker notin mock_data:
# Return a clear error string — never raise inside a tool unless you want# the agent to crash. Return errors as strings so the LLM can reason about them.return f"Ticker '{ticker}'not found. Available tickers: AAPL, MSFT"if metric notin mock_data[ticker]:
return f"Metric '{metric}'is invalid. Valid options: price, pe_ratio, market_cap"return f"{ticker} {metric}: {mock_data[ticker][metric]}"# --- Step 3: Inspect what the LLM actually sees ---# This is the JSON Schema injected into the model's context.print("=== Tool Name ===")
print(get_stock_metric.name) # 'get_stock_metric'print("\n=== Tool Description (what LLM reads) ===")
print(get_stock_metric.description)
print("\n=== Args Schema (sent as JSON Schema to the model) ===")
print(json.dumps(get_stock_metric.args_schema.model_json_schema(), indent=2))
# --- Step 4: Bind tool to model and observe the raw tool_call output ---
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools([get_stock_metric])
response = llm_with_tools.invoke("What's Apple's current P/E ratio?")
print("\n=== AIMessage tool_calls (raw LLM output — no execution yet) ===")
for tc in response.tool_calls:
print(json.dumps(tc, indent=2))
# --- Step 5: Manually execute the tool call ---# In production an agent loop or LangGraph node does this automatically.
tool_call = response.tool_calls[0]
result = get_stock_metric.invoke(tool_call["args"])
print("\n=== Tool Execution Result ===")
print(result)
Output
=== Tool Name ===
get_stock_metric
=== Tool Description (what LLM reads) ===
Retrieve a real-time financial metric for a publicly traded stock.
Use this when the user asks about current stock prices, P/E ratios,
or market capitalisation. Do NOT use this for historical data or crypto.
Returns a formatted string with the value and currency where applicable.
=== Args Schema (sent as JSON Schema to the model) ===
{
"properties": {
"ticker": {
"description": "The stock ticker symbol in uppercase, e.g. 'AAPL' or 'MSFT'",
"title": "Ticker",
"type": "string"
},
"metric": {
"description": "The metric to retrieve. One of: 'price', 'pe_ratio', 'market_cap'",
If your tool description says 'searches the web' but your tool actually only searches internal docs, the LLM will confidently call it for general web queries and be confused by the results. Write descriptions like a contract: what it does, what it returns, and explicitly what it does NOT do. That last part is often the difference between a working agent and one that loops forever.
Production Insight
Cause: A tool description acts as the LLM's sole instruction for tool selection. Effect: Ambiguity or inaccuracy in the description directly causes incorrect tool invocation, argument errors, or agent loops. Action: Treat tool descriptions as critical API documentation. Use a three-part structure: 1) Clear purpose, 2) Specific use-case trigger, 3) Explicit exclusion criteria. Test descriptions by having another LLM generate test queries and verify correct tool selection.
Key Takeaway
The tool's name is its API endpoint for the LLM. The description is its documentation. The args_schema is its contract. Skimping on any of these three creates a flaky agent. The most production-critical part is often the exclusion clause in the description.
Should This Be a Tool or Just a Function?
IfThe function requires real-time external data (API call, DB query).
→
UseYes, make it a tool. The LLM cannot access this data itself.
IfThe function performs a side-effect (send email, update record).
→
UseYes, make it a tool. This requires explicit agent intent and human oversight potential.
IfThe function is pure logic/formatting that can be done from context.
→
UseNo. Keep it as a regular function called in your code. Adding unnecessary tools increases LLM decision complexity and latency.
Building Production-Grade Custom Tools with Validation and Error Handling
The @tool decorator is convenient for simple cases, but in production you'll want BaseTool subclasses. They give you explicit control over sync vs async execution, fine-grained error handling via handle_tool_error, and the ability to inject dependencies (like database sessions or API clients) at construction time rather than using module-level globals.
The key architectural decision is: should your tool raise exceptions or return error strings? The answer depends on your agent architecture. In a simple ReAct loop, returning a descriptive error string lets the LLM reason about the failure and potentially retry with different arguments — which is usually what you want. Raising an exception bubbles up and typically terminates the agent run unless you've configured handle_tool_error=True on the executor.
Dependency injection into tools is something most tutorials skip entirely, and it's where production systems diverge from toy examples. You almost never want API keys or database connections defined at module scope inside a tool. Instead, pass them into the tool's __init__ and store them as instance attributes. This makes your tools testable (you can inject mocks), configurable per-tenant, and avoids the subtle bug where a module is imported once and caches stale credentials.
Another critical pattern is idempotency awareness. If your tool sends an email or writes to a database, you need to understand that agents can and do call tools multiple times — either due to retry logic, parallel tool calls, or the model second-guessing itself. Design write operations to be idempotent or add deduplication logic at the tool level.
production_custom_tool.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
from langchain_core.tools importBaseToolfrom pydantic importBaseModel, Field, field_validator
from typing importOptional, Typeimport httpx
import hashlib
import time
# --- Input schema with built-in validation ---classWeatherQueryInput(BaseModel):
city: str = Field(description="City name to get weather for, e.g. 'London'or'Tokyo'")
units: str = Field(
default="celsius",
description="Temperature units: 'celsius'or'fahrenheit'"
)
# Pydantic v2 validator — catches bad input BEFORE the tool runs
@field_validator("units")
@classmethod
defvalidate_units(cls, value: str) -> str:
allowed = {"celsius", "fahrenheit"}
if value.lower() notin allowed:
raiseValueError(f"Units must be one of {allowed}, got '{value}'")
return value.lower()
@field_validator("city")
@classmethod
defvalidate_city_name(cls, value: str) -> str:
# Prevent prompt injection via city nameiflen(value) > 100ornot value.replace(" ", "").replace("-", "").isalpha():
raiseValueError("City name contains invalid characters")
return value.strip().title()
# --- Production BaseTool with dependency injection ---classWeatherTool(BaseTool):
name: str = "get_current_weather"
description: str = (
"Retrieve current weather conditions for a city. ""Use this when the user asks about today's weather, temperature, ""or current conditions in a specific city. ""Returns temperature and a brief condition description. ""Do NOT use for forecasts — only current conditions."
)
args_schema: Type[BaseModel] = WeatherQueryInput# Injected dependencies — passed at construction, not module-scope globals
api_key: str
base_url: str = "https://api.openweathermap.org/data/2.5"
request_timeout: int = 5# Simple in-memory deduplication to prevent redundant API calls# In production, use Redis with TTL instead
_call_cache: dict = {}
def_make_cache_key(self, city: str, units: str) -> str:
"""Cache key includes a time bucket so we don't serve stale data"""# Round to nearest 10-minute window for caching
time_bucket = int(time.time() / 600)
raw = f"{city}:{units}:{time_bucket}"return hashlib.md5(raw.encode()).hexdigest()
def_run(self, city: str, units: str = "celsius") -> str:
"""
Synchronous execution path. LangChain calls this when the tool
is invoked in a non-async context.
"""
cache_key = self._make_cache_key(city, units)
if cache_key inself._call_cache:
print(f"[WeatherTool] Cache HIT for {city}")
returnself._call_cache[cache_key]
# Map our friendly unit names to the API's expected values
api_units = "metric"if units == "celsius"else"imperial"
unit_symbol = "°C"if units == "celsius"else"°F"try:
response = httpx.get(
f"{self.base_url}/weather",
params={"q": city, "appid": self.api_key, "units": api_units},
timeout=self.request_timeout,
)
# Don't raise on 404 — return a useful string so the LLM can handle itif response.status_code == 404:
return f"City '{city}'not found. Check the spelling andtry again."
response.raise_for_status() # Raise on 5xx errors
data = response.json()
temp = data["main"]["temp"]
condition = data["weather"][0]["description"]
humidity = data["main"]["humidity"]
result = f"{city}: {temp}{unit_symbol}, {condition}, humidity {humidity}%"# Cache the successful resultself._call_cache[cache_key] = result
return result
except httpx.TimeoutException:
# Return recoverable error — LLM can retry or inform userreturn f"Weather service timed out after {self.request_timeout}s. Try again shortly."except httpx.HTTPStatusErroras exc:
return f"Weather API error (HTTP {exc.response.status_code}). Service may be down."asyncdef_arun(self, city: str, units: str = "celsius") -> str:
"""
Async execution path — called when the agent runs in an async context.
Always implement both _run and _arun in production tools.
Using httpx.AsyncClient here for true non-blocking IO.
"""
cache_key = self._make_cache_key(city, units)
if cache_key inself._call_cache:
returnself._call_cache[cache_key]
api_units = "metric"if units == "celsius"else"imperial"
unit_symbol = "°C"if units == "celsius"else"°F"asyncwith httpx.AsyncClient(timeout=self.request_timeout) as client:
try:
response = await client.get(
f"{self.base_url}/weather",
params={"q": city, "appid": self.api_key, "units": api_units},
)
if response.status_code == 404:
return f"City '{city}'not found."
response.raise_for_status()
data = response.json()
temp = data["main"]["temp"]
condition = data["weather"][0]["description"]
humidity = data["main"]["humidity"]
result = f"{city}: {temp}{unit_symbol}, {condition}, humidity {humidity}%"self._call_cache[cache_key] = result
return result
except httpx.TimeoutException:
return"Weather service timed out. Please retry."# --- Constructing the tool with injected config ---# In a real app, api_key comes from environment variables or a secrets manager
weather_tool = WeatherTool(
api_key="your-openweathermap-api-key",
request_timeout=8
)
# --- Direct invocation test (no LLM needed) ---
result = weather_tool.invoke({"city": "London", "units": "celsius"})
print(result)
# --- Test validation catches bad input cleanly ---try:
weather_tool.invoke({"city": "London", "units": "kelvin"})
exceptExceptionas e:
print(f"Validation caught: {e}")
Output
London: 14.2°C, light rain, humidity 82%
Validation caught: 1 validation error for WeatherQueryInput
units
Value error, Units must be one of {'celsius', 'fahrenheit'}, got 'kelvin' [type=value_error, ...]
Pro Tip: Always Implement _arun
If you only implement _run and your tool gets used in an async LangGraph graph, LangChain will run it in a thread pool via asyncio.run_in_executor. That's fine for I/O-bound work but burns threads under load. Implement _arun with a proper async HTTP client (httpx.AsyncClient, aiohttp) for any tool that makes network calls — it's the difference between 50 concurrent users and 5.
Production Insight
Cause: Using module-global resources (API keys, DB pools) inside tools creates hidden state and makes testing impossible. Effect: Tools become untestable, credentials can go stale, and multi-tenant isolation breaks. Action: Use BaseTool with dependency injection. Pass all external dependencies via __init__. This enables mock injection for unit tests, per-request credential rotation, and clean separation of concerns.
Key Takeaway
A production tool is not just a function with a decorator. It's a service client with its own lifecycle, configuration, and error contract. Use BaseTool for dependency injection, implement both _run and _arun, and decide your error strategy upfront: strings for LLM reasoning, exceptions for structured handling.
Error Handling Strategy: Return String vs Raise Exception
IfAgent is a simple ReAct loop with no special error handling.
→
UseReturn a descriptive error string. The LLM can reason about it and potentially adjust its approach.
IfAgent is in a LangGraph with a dedicated error-handling node or human-in-the-loop.
→
UseRaise exceptions. Use handle_tool_error=True on the ToolNode to catch them and route to your error handler.
IfThe error is a permanent configuration issue (missing API key, invalid permissions).
→
UseRaise immediately. Do not let the agent retry. This is a programmer error, not a runtime error.
ToolExecutor in LangGraph: Wiring Tools Into a Real Agent Loop
LangGraph replaced the legacy AgentExecutor as the recommended way to build agents with LangChain, and the reason is control. AgentExecutor was a black box — hard to debug, hard to add conditional logic, and nearly impossible to add human-in-the-loop approval steps. LangGraph makes the agent loop an explicit, inspectable graph where you define exactly what happens at each node.
The core pattern is a two-node graph: a model_node that calls the LLM with tools bound, and a tools_node that executes whatever tool calls the model requested. A conditional edge between them asks: 'Did the model output any tool calls?' If yes, route to the tools node. If no (meaning the model produced a final answer), route to END. That loop is the entire agent.
ToolNode (from langgraph.prebuilt) handles the boilerplate of extracting tool_calls from the last AIMessage, routing each call to the correct tool by name, running them (optionally in parallel), and wrapping results in ToolMessageobjects. The messages_modifier pattern means all of this state flows through a single messages key in your graph state, making the full conversation history trivially inspectable at any point.
The real power comes when you need to break out of the simple loop: you can add a human_approval_node before the tools node that pauses execution and waits for a user confirmation before running destructive operations. You can add a retry_node that detects tool error strings and re-prompts the model differently. You can add a max_iterations counter in your graph state to prevent infinite loops — something AgentExecutor handled poorly.
langgraph_tool_agent.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
from typing importAnnotated, Sequencefrom langchain_core.messages importBaseMessage, HumanMessage, AIMessagefrom langchain_core.tools import tool
from langchain_openai importChatOpenAIfrom langgraph.graph importStateGraph, ENDfrom langgraph.graph.message import add_messages
from langgraph.prebuilt importToolNodefrom pydantic importBaseModelimport operator
# --- Define the tools our agent can use ---
@tool
defsearch_product_catalog(query: str) -> str:
"""
Search the internal product catalog for items matching a description.
Use this when the user asks what products we carry or wants to find
a specific item. Returns product names, IDs, and prices.
"""
# Simulated catalog — real version would query a vector store or DB
catalog = [
{"id": "P001", "name": "Wireless Keyboard", "price": 79.99},
{"id": "P002", "name": "Ergonomic Mouse", "price": 49.99},
{"id": "P003", "name": "USB-C Hub", "price": 34.99},
]
query_lower = query.lower()
matches = [
f"{p['name']} (ID: {p['id']}, ${p['price']})"for p in catalog
ifany(word in p["name"].lower() for word in query_lower.split())
]
ifnot matches:
return f"No products found matching '{query}'. Try broader search terms."return"Found: " + ", ".join(matches)
@tool
defcheck_inventory(product_id: str) -> str:
"""
Check real-time stock levels for a product by its ID.
Use this AFTER searching the catalog to get a product ID.
Returns stock count and estimated restock date if out of stock.
"""
inventory = {
"P001": {"stock": 143, "restock": None},
"P002": {"stock": 0, "restock": "2025-09-15"},
"P003": {"stock": 27, "restock": None},
}
product_id = product_id.upper()
if product_id notin inventory:
return f"Product ID '{product_id}'not recognised in inventory system."
item = inventory[product_id]
if item["stock"] == 0:
return f"{product_id}: OUT OF STOCK. Expected restock: {item['restock']}"return f"{product_id}: {item['stock']} units in stock."# Collect all tools — ToolNode needs this list to route calls by name
available_tools = [search_product_catalog, check_inventory]
# --- Define graph state ---# Annotated with add_messages reducer: new messages are APPENDED, not overwritten# This is critical — without the reducer you'd lose conversation history on each stepclassAgentState(BaseModel):
messages: Annotated[Sequence[BaseMessage], add_messages]
iteration_count: int = 0# Guard against infinite loops
MAX_ITERATIONS: int = 10# Class constant baked into state typeclassConfig:
arbitrary_types_allowed = True# --- Wire up LLM with tools ---
catalog_llm = ChatOpenAI(model="gpt-4o", temperature=0)
catalog_llm_with_tools = catalog_llm.bind_tools(available_tools)
# --- Node 1: Model node — calls the LLM and returns its response ---defcall_model_node(state: AgentState) -> dict:
"""
This node calls the LLM. It reads the full message history from state
and returns the new AIMessage. LangGraph merges this with existing messages
via the add_messages reducer.
"""
current_iteration = state.iteration_count + 1print(f"[Agent] Model call — iteration {current_iteration}")
# Safety valve: if we've looped too many times, force a stopif current_iteration >= state.MAX_ITERATIONS:
from langchain_core.messages importAIMessage
forced_stop = AIMessage(
content="I've reached the maximum number of steps. ""Here's what I found so far based on available information."
)
return {"messages": [forced_stop], "iteration_count": current_iteration}
response = catalog_llm_with_tools.invoke(state.messages)
return {"messages": [response], "iteration_count": current_iteration}
# --- Node 2: ToolNode — executes all tool_calls from the last AIMessage ---# ToolNode handles parallel execution automatically when multiple tools are called
tool_executor_node = ToolNode(tools=available_tools)
# --- Routing function: decide whether to use a tool or end ---defshould_continue_routing(state: AgentState) -> str:
"""
Conditional edge function. Inspects the last message:
- If it has tool_calls -> route to tools node
- Otherwise -> route to END (agent has its final answer)
"""
last_message = state.messages[-1]
# isinstance check ensures we're looking at an AIMessage, not a ToolMessageifisinstance(last_message, AIMessage) and last_message.tool_calls:
print(f"[Router] Routing to tools: {[tc['name'] for tc in last_message.tool_calls]}")
return"use_tools"print("[Router] No tool calls — agent finished")
return"end"# --- Build the graph ---
agent_graph = StateGraph(AgentState)
# Register nodes
agent_graph.add_node("model", call_model_node)
agent_graph.add_node("tools", tool_executor_node)
# Set entry point
agent_graph.set_entry_point("model")
# Conditional edge from model: either call tools or finish
agent_graph.add_conditional_edges(
"model",
should_continue_routing,
{
"use_tools": "tools", # route name -> node name"end": END,
},
)
# After tools execute, always go back to model to process results
agent_graph.add_edge("tools", "model")
# Compile into a runnable
runnable_agent = agent_graph.compile()
# --- Run the agent ---print("=== STARTING AGENT ===")
initial_state = {"messages": [HumanMessage(content="Do you have any keyboards in stock?")]}
final_state = runnable_agent.invoke(initial_state)
print("\n=== FINAL ANSWER ===")
print(final_state["messages"][-1].content)
print(f"Total iterations: {final_state['iteration_count']}")
Output
=== STARTING AGENT ===
[Agent] Model call — iteration 1
[Router] Routing to tools: ['search_product_catalog']
[Agent] Model call — iteration 2
[Router] Routing to tools: ['check_inventory']
[Agent] Model call — iteration 3
[Router] No tool calls — agent finished
=== FINAL ANSWER ===
Yes! We carry a Wireless Keyboard (ID: P001, priced at $79.99), and it's well-stocked with 143 units available. Would you like to place an order?
Total iterations: 3
Interview Gold: AgentExecutor vs LangGraph
Interviewers love asking why you'd choose LangGraph over the legacy AgentExecutor. The answer: LangGraph gives you an explicit, inspectable state machine where YOU control the loop. You can add human-approval nodes, conditional branching based on tool output content, parallel tool execution, and persistent checkpointing mid-run. AgentExecutor is a black box that runs until it decides it's done — you have very limited control over what happens in between.
Production Insight
Cause: The legacy AgentExecutor is an opaque loop with limited hooks for customization. Effect: You cannot easily add human approval, implement complex retry logic, or inject stateful middleware without hacky callbacks. Action: Use LangGraph. Define your agent as a StateGraph. This gives you explicit control nodes, conditional edges, and the ability to inspect/modify state at any step. It's the difference between a vending machine and a kitchen.
Key Takeaway
LangGraph transforms the agent loop from a black box into a white-box state machine. The core pattern is a model_node and a tools_node with a conditional edge. Start simple, then add nodes for human-in-the-loop, error recovery, and state management. This explicit control is non-negotiable for production agents.
When to Add a Node to Your Agent Graph
IfTool performs a destructive or costly operation (delete, purchase, send).
→
UseAdd a human_approval_node before the tools_node. Pause and wait for confirmation.
IfTool calls frequently fail with recoverable errors.
→
UseAdd a retry_node after the tools_node. Check ToolMessage content for error strings and re-prompt the LLM with adjusted context.
IfAgent conversations become long and context limits are hit.
→
UseAdd a summarization_node. Trigger it when message count exceeds a threshold. Summarize history and replace the messages list.
Parallel Tool Calls, Streaming, and Performance at Scale
Modern models (GPT-4o, Claude 3.5, Gemini 1.5 Pro) can emit multiple tool calls in a single response. This is a huge performance win — if the model needs both the weather in London AND today's news headlines, it can request both simultaneously instead of waiting for the first result before requesting the second. LangChain's ToolNode executes parallel tool calls using asyncio.gather under the hood when running in async mode, so your async tool implementations genuinely matter here.
Streaming tool results is a less-discussed but production-critical feature. In a chat UI, users stare at a blank screen while the agent thinks and calls tools. LangChain's streaming API lets you stream both the model's reasoning tokens AND tool execution events as they happen. The .astream_events() method on a compiled LangGraph emits a stream of typed events: on_chat_model_stream for token-by-token LLM output, on_tool_start when a tool begins executing, and on_tool_end when it returns — all as async generator events your UI can consume in real time.
For production deployments, the most expensive operation in a tool-heavy agent is often not the LLM itself but the aggregate latency of sequential tool calls. Profile your agents: if tool calls are sequential when they could be parallel, restructure your prompts to encourage the model to batch its requests. If a single tool call is slow (>2s), it will dominate your p95 latency. Add caching at the tool level (as shown above) and consider prefetching common tool results at session start.
parallel_tools_and_streaming.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
import asyncio
from langchain_core.tools import tool
from langchain_core.messages importHumanMessagefrom langchain_openai importChatOpenAIfrom langgraph.prebuilt import create_react_agent
import time
# --- Two slow tools that should run in PARALLEL, not sequentially ---
@tool
asyncdeffetch_user_profile(user_id: str) -> str:
"""
Fetch a user's profile from the user service by their ID.
Returns name, email, and account tier. Use when personalising responses.
"""
await asyncio.sleep(0.8) # Simulates 800ms database query
profiles = {
"user_42": {"name": "Sarah Chen", "email": "sarah@example.com", "tier": "premium"},
"user_99": {"name": "James Okafor", "email": "james@example.com", "tier": "free"},
}
if user_id notin profiles:
return f"User '{user_id}'not found"
p = profiles[user_id]
return f"Name: {p['name']}, Email: {p['email']}, Tier: {p['tier']}"
@tool
asyncdeffetch_recent_orders(user_id: str) -> str:
"""
Fetch the 3 most recent orders for a user.
Returns order IDs, items, and total amounts.
Use when user asks about their order history or past purchases.
"""
await asyncio.sleep(1.0) # Simulates 1s API call to order service
orders = {
"user_42": [
{"id": "ORD-8821", "item": "Wireless Keyboard", "total": 79.99},
{"id": "ORD-7703", "item": "USB-C Hub", "total": 34.99},
],
"user_99": [{"id": "ORD-9011", "item": "Ergonomic Mouse", "total": 49.99}],
}
if user_id notin orders:
return f"No orders found for user '{user_id}'"
order_list = ", ".join(
f"{o['id']}: {o['item']} (${o['total']})"for o in orders[user_id]
)
return f"Recent orders: {order_list}"asyncdefdemonstrate_parallel_execution():
"""
With parallel tool calling, both fetch_user_profile and fetch_recent_orders
run concurrently. Total time should be ~1s (the slower tool), not ~1.8s (sum).
"""
parallel_tools = [fetch_user_profile, fetch_recent_orders]
agent_llm = ChatOpenAI(model="gpt-4o", temperature=0)
# create_react_agent is a convenience wrapper that builds the LangGraph pattern# we built manually above — useful for standard use cases
react_agent = create_react_agent(agent_llm, parallel_tools)
user_query = (
"Give me a personalised summary for user_42. ""I need their profile AND their recent orders."
)
start_time = time.perf_counter()
result = await react_agent.ainvoke(
{"messages": [HumanMessage(content=user_query)]}
)
elapsed = time.perf_counter() - start_time
print(f"Execution time: {elapsed:.2f}s")
print(f"(If tools ran sequentially this would be ~1.8s — parallel saves ~0.8s)")
print("\nFinal response:")
print(result["messages"][-1].content)
asyncdefdemonstrate_streaming_events():
"""
Stream agent events to show the user what's happening in real time.
Thisis what you'd use to build a live 'thinking...' UI indicator.
"""
streaming_tools = [fetch_user_profile]
stream_llm = ChatOpenAI(model="gpt-4o", temperature=0)
streaming_agent = create_react_agent(stream_llm, streaming_tools)
print("\n=== STREAMING EVENTS ===")
asyncfor event in streaming_agent.astream_events(
{"messages": [HumanMessage(content="Look up profile for user_42")]},
version="v2", # v2 is the current stable events API
):
event_kind = event["event"]
if event_kind == "on_chat_model_stream":
# Stream LLM token output as it arrives
token = event["data"]["chunk"].content
if token: # Filter empty chunksprint(token, end="", flush=True)
elif event_kind == "on_tool_start":
# Notify UI that a tool is being calledprint(f"\n[UI INDICATOR] Calling tool: {event['name']}...")
elif event_kind == "on_tool_end":
# Tool finished — you could update a progress indicatorprint(f"[UI INDICATOR] Tool '{event['name']}' completed")
asyncdefmain():
awaitdemonstrate_parallel_execution()
awaitdemonstrate_streaming_events()
asyncio.run(main())
Output
Execution time: 1.07s
(If tools ran sequentially this would be ~1.8s — parallel saves ~0.8s)
Based on the profile data, Sarah Chen (user_42) is a premium tier member...
Pro Tip: Force Parallel Calls With Prompt Engineering
Models don't always batch tool calls even when they could. Add 'When you need multiple pieces of information, request all required tools in a single response rather than sequentially' to your system prompt. In tests, this simple instruction reduces total agent latency by 30-50% on multi-tool queries by eliminating unnecessary round-trips to the LLM between each tool call.
Production Insight
Cause: Sequential tool calls create an O(N) latency chain, where N is the number of tools. Effect: User-perceived latency becomes unacceptable for complex queries involving multiple data sources. Action: 1) Use prompt engineering to encourage the model to batch tool requests. 2) Ensure all tools have async _arun implementations. 3) Profile your agent to identify the slowest tool; it becomes your latency bottleneck. Cache its results agressively.
Key Takeaway
Parallel tool calls are a major performance lever, but only if your tools are async and your prompt encourages batching. Streaming is not a luxury; it's a UX requirement for interactive agents. Profile your agent's tool call sequence: the slowest single tool defines your p95 latency. Cache it or break it down.
When to Use Streaming
IfBuilding a chat UI or interactive application.
→
UseAlways use .astream_events(). Users need feedback during multi-second tool executions. A blank screen leads to abandonment.
IfRunning batch processing or backend jobs.
→
UseStreaming adds complexity with little benefit. Use .invoke() or .ainvoke() for simplicity.
IfNeed to implement custom logging or monitoring per-step.
→
UseUse streaming. You can capture on_tool_start and on_tool_end events to feed into your observability stack (e.g., log tool latency, success/failure).
● Production incidentPOST-MORTEMseverity: high
The Infinite Loop of Hallucinated Tools
Symptom
Agent response latency spiked to >30 seconds, then the conversation became a repetitive loop of 'Let me check that for you...' with no final answer. Token usage per conversation increased 800%.
Assumption
The development team assumed the LLM would only call tools that were explicitly bound to it in the tool list.
Root cause
The tool description for 'get_order_status' was too broad: "Retrieves information about an order." When a user asked about a refund, the LLM, lacking a specific refund tool, hallucinated a plausible tool name ('check_refund_status') and kept trying to call it. The ToolExecutor threw a 'tool not found' error, which was fed back to the LLM. The LLM, seeing the error, interpreted it as a transient failure and retried the same hallucinated call.
Fix
1. Made tool descriptions hyper-specific, including explicit 'Do NOT use for...' clauses.
2. Added a system prompt rule: "You may only use the tools provided in the list below. Do not invent or assume tools exist."
3. Implemented a circuit breaker in the ToolExecutor: if the same tool name fails 3 consecutive times, force the agent to generate a final answer stating it cannot complete the request.
Key lesson
The LLM's tool selection is probabilistic, not deterministic. It will 'guess' if the context is ambiguous.
Error messages from the tool executor are part of the LLM's context. A 'not found' error can be misinterpreted as a retryable failure.
Production agents need guardrails against self-reinforcing failure loops, including max iteration limits and hallucinated tool detection.
Production debug guideWhen the agent calls the wrong tool, ignores a tool, or gets stuck in a loop.4 entries
Symptom · 01
Agent consistently selects the wrong tool for a clear intent.
→
Fix
First, inspect the tool descriptions. Are they ambiguous? Does the description of Tool A overlap with the use case for Tool B? Rewrite descriptions to be mutually exclusive. Second, log the full AIMessage.tool_calls to see exactly what arguments the LLM is passing; the issue might be argument selection, not tool selection.
Symptom · 02
Agent does not use a tool when it should (e.g., answers from memory instead of fetching live data).
→
Fix
Check if the tool description explicitly states WHEN to use it. Add a strong directive: "You MUST use this tool when the user asks about current [X]. Do not answer from your training data." Also, verify the tool is correctly bound to the model; print the model's tools parameter to confirm the schema is present.
Symptom · 03
Agent enters an infinite loop, calling the same tool repeatedly with similar arguments.
→
Fix
This usually indicates the tool's output is not giving the LLM enough information to progress. Check the tool's return string. Is it an error message the LLM doesn't understand? Is it a success message that lacks the data needed for the next step? Enhance the tool's output to be more descriptive. Also, implement a hard max_iterations cap in your graph state.
Symptom · 04
Parallel tool calls fail or return out of order.
→
Fix
Ensure your tool functions are stateless and thread-safe. If they share a mutable resource (like a database connection pool), you have a race condition. Use dependency injection to provide separate resources or proper connection pooling. Also, verify you are using ToolNode which handles parallel execution correctly.
★ LangChain Agent Triage Cheat SheetFast diagnostics for common production agent failures.
Agent repeats the same action.−
Immediate action
Check for infinite loop. Force stop and inspect last 3 messages.
Add iteration_count to state and hard-stop at MAX_ITERATIONS=10. Review tool output clarity.
High latency on simple queries.+
Immediate action
Profile tool call sequence. Look for sequential calls that could be parallel.
Commands
Enable LangSmith tracing or add manual timing around `tool_executor.invoke`.
Check if model is batching tool_calls in single response (see `AIMessage.tool_calls` length).
Fix now
Add prompt instruction: "Request all required tools in one response." Implement async tools with _arun.
Tool validation errors crash the agent.+
Immediate action
Check if `handle_tool_error` is set on the executor or tool.
Commands
Wrap tool logic in try/except and return error string, do not raise.
Inspect Pydantic `args_schema` for overly strict validators that reject valid LLM output.
Fix now
Implement BaseTool subclass with _run returning strings. Use @field_validator with clear error messages.
Agent Architecture Comparison
Feature
Legacy AgentExecutor
LangGraph StateGraph
Raw While-Loop
Control Flow
Opaque, limited callbacks
Explicit nodes & conditional edges
Full control, manual state management
Human-in-the-Loop
Difficult, requires custom callbacks
First-class node support
Possible but error-prone
Debugging
Hard, state is hidden
Easy, state is inspectable at each node
Easy if you add logging
Parallel Tool Execution
Limited
Built-in via ToolNode
Manual with asyncio.gather
Streaming Support
Basic
Full event streaming
Manual implementation
Persistence/Checkpointing
Not supported
Built-in with checkpointer
Manual implementation
Best For
Simple prototypes
Most production agents
Highly custom, non-standard flows
Key takeaways
1
Tools are a contract
name, description, args_schema. The description is the LLM's primary prompt—write it like a critical API doc.
2
Use BaseTool for production. Inject dependencies, implement both _run and _arun, and decide your error-handling strategy (return string vs. raise).
3
LangGraph > AgentExecutor. Build your agent as an explicit StateGraph for control, debuggability, and features like human-in-the-loop.
4
Parallel tool calls are a major performance win. Prompt the model to batch requests and ensure your tools are async.
5
Streaming is a UX requirement, not a nice-to-have. Use .astream_events() to give users real-time feedback.
6
Guard against failure modes
infinite loops, hallucinated tools, duplicate side-effects. Build defenses into your graph and tools.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01JUNIOR
Walk me through the lifecycle of a LangChain tool call, from the LLM's d...
Q02JUNIOR
Why would you choose LangGraph over the legacy AgentExecutor for a produ...
Q03JUNIOR
How do you prevent an agent from getting stuck in an infinite loop?
Q04JUNIOR
A tool in your agent makes a slow API call (2-3 seconds). How do you opt...
Q01 of 04JUNIOR
Walk me through the lifecycle of a LangChain tool call, from the LLM's decision to the result being fed back.
ANSWER
1) Tools are bound to the LLM via .bind_tools(), serializing their Pydantic schemas into JSON Schema. 2) The LLM, when it decides a tool is needed, returns an AIMessage with a tool_calls list containing the tool name and arguments. 3) The ToolExecutor or LangGraph ToolNode intercepts this, routes each call to the matching Python function, executes it, and wraps the result in a ToolMessage. 4) This ToolMessage is appended to the conversation history, and the LLM processes it in its next reasoning step. The LLM never executes code; it only declares intent.
Q02 of 04JUNIOR
Why would you choose LangGraph over the legacy AgentExecutor for a production agent?
ANSWER
LangGraph provides an explicit, inspectable state machine. This gives me: 1) Fine-grained control over the loop with conditional edges. 2) The ability to add nodes for human approval before destructive actions. 3) Built-in support for parallel tool execution and streaming. 4) Persistent checkpointing to save and resume agent state. 5) Easier debugging, as I can inspect the state at any node. AgentExecutor is a black box that lacks these capabilities.
Q03 of 04JUNIOR
How do you prevent an agent from getting stuck in an infinite loop?
ANSWER
I use a multi-layered approach. 1) In the graph state, I maintain an iteration_count and set a hard MAX_ITERATIONS (e.g., 10). The model node checks this and forces a stop if exceeded. 2) I write hyper-specific tool descriptions with exclusion clauses to reduce the chance of the LLM making a wrong turn. 3) I design tool outputs to be informative, giving the LLM clear success or failure signals to reason about. 4) I might add a dedicated loop-detection node that looks for repeated patterns in the message history.
Q04 of 04JUNIOR
A tool in your agent makes a slow API call (2-3 seconds). How do you optimize the agent's performance?
ANSWER
First, I profile to confirm it's the bottleneck. Then, I implement several fixes: 1) Implement the tool's _arun method with an async HTTP client to avoid blocking. 2) Add a caching layer (in-memory with TTL or Redis) to the tool itself for idempotent requests. 3) Use prompt engineering to encourage the LLM to batch this tool call with others in a single response, enabling parallel execution. 4) If possible, prefetch this data at the start of the conversation based on context.
01
Walk me through the lifecycle of a LangChain tool call, from the LLM's decision to the result being fed back.
JUNIOR
02
Why would you choose LangGraph over the legacy AgentExecutor for a production agent?
JUNIOR
03
How do you prevent an agent from getting stuck in an infinite loop?
JUNIOR
04
A tool in your agent makes a slow API call (2-3 seconds). How do you optimize the agent's performance?
JUNIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
Can the LLM execute code directly when I use a tool?
No. This is a critical security boundary. The LLM only outputs a structured request (a tool_call). Your orchestration layer (LangChain) executes the actual Python function. The LLM never has direct access to your filesystem, network, or runtime.
Was this helpful?
02
Should I use the simple `@tool` decorator or a `BaseTool` subclass?
Use @tool for simple, stateless tools during prototyping. Use BaseTool for anything production-grade: tools that need configuration, make network calls, require error handling, or need to be tested with mocks. BaseTool gives you dependency injection and explicit async support.
Was this helpful?
03
How do I test a LangChain tool?
If you built it with BaseTool and dependency injection, testing is straightforward. Instantiate the tool with mock dependencies (e.g., a mock HTTP client) and call tool.invoke({...}) directly, without an LLM. This tests the pure function logic. For agent-level testing, use LangGraph's streaming to inspect the tool_calls and ToolMessages at each step.
Was this helpful?
04
What's the biggest mistake people make with tool descriptions?
Being vague. A description like "Searches for information" will cause the LLM to call it for everything, including queries it can't handle. A good description states the specific use case, what it returns, and explicitly what it does NOT do (e.g., "Do NOT use for historical data").