Skip to content
Home ML / AI LangChain Tools Explained: How LLMs Take Real-World Actions

LangChain Tools Explained: How LLMs Take Real-World Actions

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Tools → Topic 8 of 12
LangChain Tools let LLMs search the web, run code, and call APIs.
🔥 Advanced — solid ML / AI foundation required
In this tutorial, you'll learn
LangChain Tools let LLMs search the web, run code, and call APIs.
  • Tools are a contract: name, description, args_schema. The description is the LLM's primary prompt—write it like a critical API doc.
  • Use BaseTool for production. Inject dependencies, implement both _run and _arun, and decide your error-handling strategy (return string vs. raise).
  • LangGraph > AgentExecutor. Build your agent as an explicit StateGraph for control, debuggability, and features like human-in-the-loop.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Core components: A tool has a name (identifier), description (LLM's usage guide), and args_schema (Pydantic model for validation).
  • Execution flow: The LLM emits a tool_calls request; the ToolExecutor or LangGraph node executes the function and returns the result.
  • Production value: Transforms a frozen LLM into an agent that can fetch real-time data, perform calculations, and trigger side-effects.
  • Critical insight: The tool's description is the LLM's primary prompt. A vague or incorrect description is the #1 cause of agent failure.
  • Performance lever: Modern models support parallel tool calls. Batching requests in a single LLM response drastically reduces end-to-end latency.
  • Biggest mistake: Treating tools as simple functions. They are API contracts for a non-human agent and require defensive design, validation, and error handling.
🚨 START HERE
LangChain Agent Triage Cheat Sheet
Fast diagnostics for common production agent failures.
🟡Agent repeats the same action.
Immediate ActionCheck for infinite loop. Force stop and inspect last 3 messages.
Commands
print(state['messages'][-3:]) # In LangGraph node
grep -i 'tool_calls' agent.log | tail -5 # Check for repeated calls
Fix NowAdd `iteration_count` to state and hard-stop at MAX_ITERATIONS=10. Review tool output clarity.
🟠High latency on simple queries.
Immediate ActionProfile tool call sequence. Look for sequential calls that could be parallel.
Commands
Enable LangSmith tracing or add manual timing around `tool_executor.invoke`.
Check if model is batching tool_calls in single response (see `AIMessage.tool_calls` length).
Fix NowAdd prompt instruction: "Request all required tools in one response." Implement async tools with `_arun`.
🟡Tool validation errors crash the agent.
Immediate ActionCheck if `handle_tool_error` is set on the executor or tool.
Commands
Wrap tool logic in try/except and return error string, do not raise.
Inspect Pydantic `args_schema` for overly strict validators that reject valid LLM output.
Fix NowImplement `BaseTool` subclass with `_run` returning strings. Use `@field_validator` with clear error messages.
Production IncidentThe Infinite Loop of Hallucinated ToolsA customer support agent entered an infinite loop, repeatedly calling a non-existent 'check_refund_status' tool, confusing the user and exhausting the token budget.
SymptomAgent response latency spiked to >30 seconds, then the conversation became a repetitive loop of 'Let me check that for you...' with no final answer. Token usage per conversation increased 800%.
AssumptionThe development team assumed the LLM would only call tools that were explicitly bound to it in the tool list.
Root causeThe tool description for 'get_order_status' was too broad: "Retrieves information about an order." When a user asked about a refund, the LLM, lacking a specific refund tool, hallucinated a plausible tool name ('check_refund_status') and kept trying to call it. The ToolExecutor threw a 'tool not found' error, which was fed back to the LLM. The LLM, seeing the error, interpreted it as a transient failure and retried the same hallucinated call.
Fix1. Made tool descriptions hyper-specific, including explicit 'Do NOT use for...' clauses. 2. Added a system prompt rule: "You may only use the tools provided in the list below. Do not invent or assume tools exist." 3. Implemented a circuit breaker in the ToolExecutor: if the same tool name fails 3 consecutive times, force the agent to generate a final answer stating it cannot complete the request.
Key Lesson
The LLM's tool selection is probabilistic, not deterministic. It will 'guess' if the context is ambiguous.Error messages from the tool executor are part of the LLM's context. A 'not found' error can be misinterpreted as a retryable failure.Production agents need guardrails against self-reinforcing failure loops, including max iteration limits and hallucinated tool detection.
Production Debug GuideWhen the agent calls the wrong tool, ignores a tool, or gets stuck in a loop.
Agent consistently selects the wrong tool for a clear intent.First, inspect the tool descriptions. Are they ambiguous? Does the description of Tool A overlap with the use case for Tool B? Rewrite descriptions to be mutually exclusive. Second, log the full AIMessage.tool_calls to see exactly what arguments the LLM is passing; the issue might be argument selection, not tool selection.
Agent does not use a tool when it should (e.g., answers from memory instead of fetching live data).Check if the tool description explicitly states WHEN to use it. Add a strong directive: "You MUST use this tool when the user asks about current [X]. Do not answer from your training data." Also, verify the tool is correctly bound to the model; print the model's tools parameter to confirm the schema is present.
Agent enters an infinite loop, calling the same tool repeatedly with similar arguments.This usually indicates the tool's output is not giving the LLM enough information to progress. Check the tool's return string. Is it an error message the LLM doesn't understand? Is it a success message that lacks the data needed for the next step? Enhance the tool's output to be more descriptive. Also, implement a hard max_iterations cap in your graph state.
Parallel tool calls fail or return out of order.Ensure your tool functions are stateless and thread-safe. If they share a mutable resource (like a database connection pool), you have a race condition. Use dependency injection to provide separate resources or proper connection pooling. Also, verify you are using ToolNode which handles parallel execution correctly.

An LLM in isolation is a reasoning engine with no connection to the live environment. It cannot verify current facts, execute transactions, or interact with proprietary systems. This limitation makes vanilla LLMs unsuitable for most production applications where actions, not just answers, are required.

LangChain Tools provide a structured interface to bridge this gap. They are not merely function wrappers; they are a formal contract between the orchestration layer and the LLM. The contract specifies what action can be taken, when it should be used, and what inputs it requires. The LLM's role is to parse user intent and select the appropriate tool; the framework's role is to execute it safely.

A common misconception is that tools give the LLM direct access to APIs. In reality, the LLM never executes code. It only outputs a structured request. The security boundary remains intact: the orchestration layer (your code) retains full control over execution, validation, and error handling. Understanding this separation is critical for building secure, reliable agents.

How LangChain Tools Actually Work Under the Hood

A LangChain Tool is not magic — it's a Python callable wrapped in a metadata contract. That contract has three mandatory fields: a name (a short snake_case identifier the LLM uses to invoke it), a description (the natural-language prompt that tells the LLM WHEN and WHY to use this tool), and an args_schema (a Pydantic model that enforces what arguments are valid). That's the entire surface area. Everything else is implementation.

When you bind tools to a chat model using .bind_tools(), LangChain serializes those Pydantic schemas into JSON Schema and injects them into the system prompt or into the model's tools parameter (depending on the provider). The LLM sees a list of callable 'functions' in its context window. When it decides to use one, it returns an AIMessage with a tool_calls attribute — a list of structured dicts containing the tool name and arguments. Crucially, the LLM does NOT execute anything. It just declares intent.

The ToolExecutor (or a LangGraph node) picks up those tool_calls, routes each one to the matching Python function, runs it, wraps the result in a ToolMessage, and appends it back to the conversation history. The model then reads that ToolMessage and continues reasoning. This request-execute-observe loop is the entire foundation of ReAct-style agents.

The description field is more important than most developers realize. It IS the tool's API documentation for the LLM. A vague description causes the model to call the wrong tool, call it with wrong arguments, or hallucinate that a tool exists. Treat descriptions like you'd treat a well-written docstring that a new engineer has to act on without asking questions.

tool_internals_demo.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
import json

# --- Step 1: Define a strict input schema with Pydantic ---
# This schema becomes JSON Schema that gets sent to the LLM.
# Field descriptions are part of what the LLM reads to understand how to call this.
class StockLookupInput(BaseModel):
    ticker: str = Field(
        description="The stock ticker symbol in uppercase, e.g. 'AAPL' or 'MSFT'"
    )
    metric: str = Field(
        description="The metric to retrieve. One of: 'price', 'pe_ratio', 'market_cap'"
    )

# --- Step 2: Decorate with @tool and provide a rich description ---
# The description is the LLM's only guide for WHEN to use this.
# Be specific: tell it what it returns and when NOT to use it.
@tool(args_schema=StockLookupInput)
def get_stock_metric(ticker: str, metric: str) -> str:
    """
    Retrieve a real-time financial metric for a publicly traded stock.
    Use this when the user asks about current stock prices, P/E ratios,
    or market capitalisation. Do NOT use this for historical data or crypto.
    Returns a formatted string with the value and currency where applicable.
    """
    # Simulated data store — in production this would call a financial API
    mock_data = {
        "AAPL": {"price": "$189.43", "pe_ratio": "31.2", "market_cap": "$2.94T"},
        "MSFT": {"price": "$415.61", "pe_ratio": "36.8", "market_cap": "$3.08T"},
    }
    ticker = ticker.upper()
    if ticker not in mock_data:
        # Return a clear error string — never raise inside a tool unless you want
        # the agent to crash. Return errors as strings so the LLM can reason about them.
        return f"Ticker '{ticker}' not found. Available tickers: AAPL, MSFT"
    if metric not in mock_data[ticker]:
        return f"Metric '{metric}' is invalid. Valid options: price, pe_ratio, market_cap"
    return f"{ticker} {metric}: {mock_data[ticker][metric]}"

# --- Step 3: Inspect what the LLM actually sees ---
# This is the JSON Schema injected into the model's context.
print("=== Tool Name ===")
print(get_stock_metric.name)  # 'get_stock_metric'

print("\n=== Tool Description (what LLM reads) ===")
print(get_stock_metric.description)

print("\n=== Args Schema (sent as JSON Schema to the model) ===")
print(json.dumps(get_stock_metric.args_schema.model_json_schema(), indent=2))

# --- Step 4: Bind tool to model and observe the raw tool_call output ---
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools([get_stock_metric])

response = llm_with_tools.invoke("What's Apple's current P/E ratio?")

print("\n=== AIMessage tool_calls (raw LLM output — no execution yet) ===")
for tc in response.tool_calls:
    print(json.dumps(tc, indent=2))

# --- Step 5: Manually execute the tool call ---
# In production an agent loop or LangGraph node does this automatically.
tool_call = response.tool_calls[0]
result = get_stock_metric.invoke(tool_call["args"])
print("\n=== Tool Execution Result ===")
print(result)
▶ Output
=== Tool Name ===
get_stock_metric

=== Tool Description (what LLM reads) ===
Retrieve a real-time financial metric for a publicly traded stock.
Use this when the user asks about current stock prices, P/E ratios,
or market capitalisation. Do NOT use this for historical data or crypto.
Returns a formatted string with the value and currency where applicable.

=== Args Schema (sent as JSON Schema to the model) ===
{
"properties": {
"ticker": {
"description": "The stock ticker symbol in uppercase, e.g. 'AAPL' or 'MSFT'",
"title": "Ticker",
"type": "string"
},
"metric": {
"description": "The metric to retrieve. One of: 'price', 'pe_ratio', 'market_cap'",
"title": "Metric",
"type": "string"
}
},
"required": ["ticker", "metric"],
"title": "StockLookupInput",
"type": "object"
}

=== AIMessage tool_calls (raw LLM output — no execution yet) ===
{
"name": "get_stock_metric",
"args": {
"ticker": "AAPL",
"metric": "pe_ratio"
},
"id": "call_abc123xyz",
"type": "tool_call"
}

=== Tool Execution Result ===
APPL pe_ratio: 31.2
⚠ Watch Out: The Description IS the Prompt
If your tool description says 'searches the web' but your tool actually only searches internal docs, the LLM will confidently call it for general web queries and be confused by the results. Write descriptions like a contract: what it does, what it returns, and explicitly what it does NOT do. That last part is often the difference between a working agent and one that loops forever.
📊 Production Insight
Cause: A tool description acts as the LLM's sole instruction for tool selection. Effect: Ambiguity or inaccuracy in the description directly causes incorrect tool invocation, argument errors, or agent loops. Action: Treat tool descriptions as critical API documentation. Use a three-part structure: 1) Clear purpose, 2) Specific use-case trigger, 3) Explicit exclusion criteria. Test descriptions by having another LLM generate test queries and verify correct tool selection.
🎯 Key Takeaway
The tool's name is its API endpoint for the LLM. The description is its documentation. The args_schema is its contract. Skimping on any of these three creates a flaky agent. The most production-critical part is often the exclusion clause in the description.
Should This Be a Tool or Just a Function?
IfThe function requires real-time external data (API call, DB query).
UseYes, make it a tool. The LLM cannot access this data itself.
IfThe function performs a side-effect (send email, update record).
UseYes, make it a tool. This requires explicit agent intent and human oversight potential.
IfThe function is pure logic/formatting that can be done from context.
UseNo. Keep it as a regular function called in your code. Adding unnecessary tools increases LLM decision complexity and latency.

Building Production-Grade Custom Tools with Validation and Error Handling

The @tool decorator is convenient for simple cases, but in production you'll want BaseTool subclasses. They give you explicit control over sync vs async execution, fine-grained error handling via handle_tool_error, and the ability to inject dependencies (like database sessions or API clients) at construction time rather than using module-level globals.

The key architectural decision is: should your tool raise exceptions or return error strings? The answer depends on your agent architecture. In a simple ReAct loop, returning a descriptive error string lets the LLM reason about the failure and potentially retry with different arguments — which is usually what you want. Raising an exception bubbles up and typically terminates the agent run unless you've configured handle_tool_error=True on the executor.

Dependency injection into tools is something most tutorials skip entirely, and it's where production systems diverge from toy examples. You almost never want API keys or database connections defined at module scope inside a tool. Instead, pass them into the tool's __init__ and store them as instance attributes. This makes your tools testable (you can inject mocks), configurable per-tenant, and avoids the subtle bug where a module is imported once and caches stale credentials.

Another critical pattern is idempotency awareness. If your tool sends an email or writes to a database, you need to understand that agents can and do call tools multiple times — either due to retry logic, parallel tool calls, or the model second-guessing itself. Design write operations to be idempotent or add deduplication logic at the tool level.

production_custom_tool.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field, field_validator
from typing import Optional, Type
import httpx
import hashlib
import time

# --- Input schema with built-in validation ---
class WeatherQueryInput(BaseModel):
    city: str = Field(description="City name to get weather for, e.g. 'London' or 'Tokyo'")
    units: str = Field(
        default="celsius",
        description="Temperature units: 'celsius' or 'fahrenheit'"
    )

    # Pydantic v2 validator — catches bad input BEFORE the tool runs
    @field_validator("units")
    @classmethod
    def validate_units(cls, value: str) -> str:
        allowed = {"celsius", "fahrenheit"}
        if value.lower() not in allowed:
            raise ValueError(f"Units must be one of {allowed}, got '{value}'")
        return value.lower()

    @field_validator("city")
    @classmethod
    def validate_city_name(cls, value: str) -> str:
        # Prevent prompt injection via city name
        if len(value) > 100 or not value.replace(" ", "").replace("-", "").isalpha():
            raise ValueError("City name contains invalid characters")
        return value.strip().title()

# --- Production BaseTool with dependency injection ---
class WeatherTool(BaseTool):
    name: str = "get_current_weather"
    description: str = (
        "Retrieve current weather conditions for a city. "
        "Use this when the user asks about today's weather, temperature, "
        "or current conditions in a specific city. "
        "Returns temperature and a brief condition description. "
        "Do NOT use for forecasts — only current conditions."
    )
    args_schema: Type[BaseModel] = WeatherQueryInput

    # Injected dependencies — passed at construction, not module-scope globals
    api_key: str
    base_url: str = "https://api.openweathermap.org/data/2.5"
    request_timeout: int = 5

    # Simple in-memory deduplication to prevent redundant API calls
    # In production, use Redis with TTL instead
    _call_cache: dict = {}

    def _make_cache_key(self, city: str, units: str) -> str:
        """Cache key includes a time bucket so we don't serve stale data"""
        # Round to nearest 10-minute window for caching
        time_bucket = int(time.time() / 600)
        raw = f"{city}:{units}:{time_bucket}"
        return hashlib.md5(raw.encode()).hexdigest()

    def _run(self, city: str, units: str = "celsius") -> str:
        """
        Synchronous execution path. LangChain calls this when the tool
        is invoked in a non-async context.
        """
        cache_key = self._make_cache_key(city, units)
        if cache_key in self._call_cache:
            print(f"[WeatherTool] Cache HIT for {city}")
            return self._call_cache[cache_key]

        # Map our friendly unit names to the API's expected values
        api_units = "metric" if units == "celsius" else "imperial"
        unit_symbol = "°C" if units == "celsius" else "°F"

        try:
            response = httpx.get(
                f"{self.base_url}/weather",
                params={"q": city, "appid": self.api_key, "units": api_units},
                timeout=self.request_timeout,
            )
            # Don't raise on 404 — return a useful string so the LLM can handle it
            if response.status_code == 404:
                return f"City '{city}' not found. Check the spelling and try again."

            response.raise_for_status()  # Raise on 5xx errors
            data = response.json()

            temp = data["main"]["temp"]
            condition = data["weather"][0]["description"]
            humidity = data["main"]["humidity"]
            result = f"{city}: {temp}{unit_symbol}, {condition}, humidity {humidity}%"

            # Cache the successful result
            self._call_cache[cache_key] = result
            return result

        except httpx.TimeoutException:
            # Return recoverable error — LLM can retry or inform user
            return f"Weather service timed out after {self.request_timeout}s. Try again shortly."
        except httpx.HTTPStatusError as exc:
            return f"Weather API error (HTTP {exc.response.status_code}). Service may be down."

    async def _arun(self, city: str, units: str = "celsius") -> str:
        """
        Async execution path — called when the agent runs in an async context.
        Always implement both _run and _arun in production tools.
        Using httpx.AsyncClient here for true non-blocking IO.
        """
        cache_key = self._make_cache_key(city, units)
        if cache_key in self._call_cache:
            return self._call_cache[cache_key]

        api_units = "metric" if units == "celsius" else "imperial"
        unit_symbol = "°C" if units == "celsius" else "°F"

        async with httpx.AsyncClient(timeout=self.request_timeout) as client:
            try:
                response = await client.get(
                    f"{self.base_url}/weather",
                    params={"q": city, "appid": self.api_key, "units": api_units},
                )
                if response.status_code == 404:
                    return f"City '{city}' not found."
                response.raise_for_status()
                data = response.json()
                temp = data["main"]["temp"]
                condition = data["weather"][0]["description"]
                humidity = data["main"]["humidity"]
                result = f"{city}: {temp}{unit_symbol}, {condition}, humidity {humidity}%"
                self._call_cache[cache_key] = result
                return result
            except httpx.TimeoutException:
                return "Weather service timed out. Please retry."

# --- Constructing the tool with injected config ---
# In a real app, api_key comes from environment variables or a secrets manager
weather_tool = WeatherTool(
    api_key="your-openweathermap-api-key",
    request_timeout=8
)

# --- Direct invocation test (no LLM needed) ---
result = weather_tool.invoke({"city": "London", "units": "celsius"})
print(result)

# --- Test validation catches bad input cleanly ---
try:
    weather_tool.invoke({"city": "London", "units": "kelvin"})
except Exception as e:
    print(f"Validation caught: {e}")
▶ Output
London: 14.2°C, light rain, humidity 82%
Validation caught: 1 validation error for WeatherQueryInput
units
Value error, Units must be one of {'celsius', 'fahrenheit'}, got 'kelvin' [type=value_error, ...]
💡Pro Tip: Always Implement _arun
If you only implement _run and your tool gets used in an async LangGraph graph, LangChain will run it in a thread pool via asyncio.run_in_executor. That's fine for I/O-bound work but burns threads under load. Implement _arun with a proper async HTTP client (httpx.AsyncClient, aiohttp) for any tool that makes network calls — it's the difference between 50 concurrent users and 5.
📊 Production Insight
Cause: Using module-global resources (API keys, DB pools) inside tools creates hidden state and makes testing impossible. Effect: Tools become untestable, credentials can go stale, and multi-tenant isolation breaks. Action: Use BaseTool with dependency injection. Pass all external dependencies via __init__. This enables mock injection for unit tests, per-request credential rotation, and clean separation of concerns.
🎯 Key Takeaway
A production tool is not just a function with a decorator. It's a service client with its own lifecycle, configuration, and error contract. Use BaseTool for dependency injection, implement both _run and _arun, and decide your error strategy upfront: strings for LLM reasoning, exceptions for structured handling.
Error Handling Strategy: Return String vs Raise Exception
IfAgent is a simple ReAct loop with no special error handling.
UseReturn a descriptive error string. The LLM can reason about it and potentially adjust its approach.
IfAgent is in a LangGraph with a dedicated error-handling node or human-in-the-loop.
UseRaise exceptions. Use handle_tool_error=True on the ToolNode to catch them and route to your error handler.
IfThe error is a permanent configuration issue (missing API key, invalid permissions).
UseRaise immediately. Do not let the agent retry. This is a programmer error, not a runtime error.

ToolExecutor in LangGraph: Wiring Tools Into a Real Agent Loop

LangGraph replaced the legacy AgentExecutor as the recommended way to build agents with LangChain, and the reason is control. AgentExecutor was a black box — hard to debug, hard to add conditional logic, and nearly impossible to add human-in-the-loop approval steps. LangGraph makes the agent loop an explicit, inspectable graph where you define exactly what happens at each node.

The core pattern is a two-node graph: a model_node that calls the LLM with tools bound, and a tools_node that executes whatever tool calls the model requested. A conditional edge between them asks: 'Did the model output any tool calls?' If yes, route to the tools node. If no (meaning the model produced a final answer), route to END. That loop is the entire agent.

ToolNode (from langgraph.prebuilt) handles the boilerplate of extracting tool_calls from the last AIMessage, routing each call to the correct tool by name, running them (optionally in parallel), and wrapping results in ToolMessage objects. The messages_modifier pattern means all of this state flows through a single messages key in your graph state, making the full conversation history trivially inspectable at any point.

The real power comes when you need to break out of the simple loop: you can add a human_approval_node before the tools node that pauses execution and waits for a user confirmation before running destructive operations. You can add a retry_node that detects tool error strings and re-prompts the model differently. You can add a max_iterations counter in your graph state to prevent infinite loops — something AgentExecutor handled poorly.

langgraph_tool_agent.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
from typing import Annotated, Sequence
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from pydantic import BaseModel
import operator

# --- Define the tools our agent can use ---

@tool
def search_product_catalog(query: str) -> str:
    """
    Search the internal product catalog for items matching a description.
    Use this when the user asks what products we carry or wants to find
    a specific item. Returns product names, IDs, and prices.
    """
    # Simulated catalog — real version would query a vector store or DB
    catalog = [
        {"id": "P001", "name": "Wireless Keyboard", "price": 79.99},
        {"id": "P002", "name": "Ergonomic Mouse", "price": 49.99},
        {"id": "P003", "name": "USB-C Hub", "price": 34.99},
    ]
    query_lower = query.lower()
    matches = [
        f"{p['name']} (ID: {p['id']}, ${p['price']})"
        for p in catalog
        if any(word in p["name"].lower() for word in query_lower.split())
    ]
    if not matches:
        return f"No products found matching '{query}'. Try broader search terms."
    return "Found: " + ", ".join(matches)

@tool
def check_inventory(product_id: str) -> str:
    """
    Check real-time stock levels for a product by its ID.
    Use this AFTER searching the catalog to get a product ID.
    Returns stock count and estimated restock date if out of stock.
    """
    inventory = {
        "P001": {"stock": 143, "restock": None},
        "P002": {"stock": 0, "restock": "2025-09-15"},
        "P003": {"stock": 27, "restock": None},
    }
    product_id = product_id.upper()
    if product_id not in inventory:
        return f"Product ID '{product_id}' not recognised in inventory system."

    item = inventory[product_id]
    if item["stock"] == 0:
        return f"{product_id}: OUT OF STOCK. Expected restock: {item['restock']}"
    return f"{product_id}: {item['stock']} units in stock."

# Collect all tools — ToolNode needs this list to route calls by name
available_tools = [search_product_catalog, check_inventory]

# --- Define graph state ---
# Annotated with add_messages reducer: new messages are APPENDED, not overwritten
# This is critical — without the reducer you'd lose conversation history on each step
class AgentState(BaseModel):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    iteration_count: int = 0  # Guard against infinite loops
    MAX_ITERATIONS: int = 10  # Class constant baked into state type

    class Config:
        arbitrary_types_allowed = True

# --- Wire up LLM with tools ---
catalog_llm = ChatOpenAI(model="gpt-4o", temperature=0)
catalog_llm_with_tools = catalog_llm.bind_tools(available_tools)

# --- Node 1: Model node — calls the LLM and returns its response ---
def call_model_node(state: AgentState) -> dict:
    """
    This node calls the LLM. It reads the full message history from state
    and returns the new AIMessage. LangGraph merges this with existing messages
    via the add_messages reducer.
    """
    current_iteration = state.iteration_count + 1
    print(f"[Agent] Model call — iteration {current_iteration}")

    # Safety valve: if we've looped too many times, force a stop
    if current_iteration >= state.MAX_ITERATIONS:
        from langchain_core.messages import AIMessage
        forced_stop = AIMessage(
            content="I've reached the maximum number of steps. "
                    "Here's what I found so far based on available information."
        )
        return {"messages": [forced_stop], "iteration_count": current_iteration}

    response = catalog_llm_with_tools.invoke(state.messages)
    return {"messages": [response], "iteration_count": current_iteration}

# --- Node 2: ToolNode — executes all tool_calls from the last AIMessage ---
# ToolNode handles parallel execution automatically when multiple tools are called
tool_executor_node = ToolNode(tools=available_tools)

# --- Routing function: decide whether to use a tool or end ---
def should_continue_routing(state: AgentState) -> str:
    """
    Conditional edge function. Inspects the last message:
    - If it has tool_calls -> route to tools node
    - Otherwise -> route to END (agent has its final answer)
    """
    last_message = state.messages[-1]
    # isinstance check ensures we're looking at an AIMessage, not a ToolMessage
    if isinstance(last_message, AIMessage) and last_message.tool_calls:
        print(f"[Router] Routing to tools: {[tc['name'] for tc in last_message.tool_calls]}")
        return "use_tools"
    print("[Router] No tool calls — agent finished")
    return "end"

# --- Build the graph ---
agent_graph = StateGraph(AgentState)

# Register nodes
agent_graph.add_node("model", call_model_node)
agent_graph.add_node("tools", tool_executor_node)

# Set entry point
agent_graph.set_entry_point("model")

# Conditional edge from model: either call tools or finish
agent_graph.add_conditional_edges(
    "model",
    should_continue_routing,
    {
        "use_tools": "tools",  # route name -> node name
        "end": END,
    },
)

# After tools execute, always go back to model to process results
agent_graph.add_edge("tools", "model")

# Compile into a runnable
runnable_agent = agent_graph.compile()

# --- Run the agent ---
print("=== STARTING AGENT ===")
initial_state = {"messages": [HumanMessage(content="Do you have any keyboards in stock?")]}

final_state = runnable_agent.invoke(initial_state)

print("\n=== FINAL ANSWER ===")
print(final_state["messages"][-1].content)
print(f"Total iterations: {final_state['iteration_count']}")
▶ Output
=== STARTING AGENT ===
[Agent] Model call — iteration 1
[Router] Routing to tools: ['search_product_catalog']
[Agent] Model call — iteration 2
[Router] Routing to tools: ['check_inventory']
[Agent] Model call — iteration 3
[Router] No tool calls — agent finished

=== FINAL ANSWER ===
Yes! We carry a Wireless Keyboard (ID: P001, priced at $79.99), and it's well-stocked with 143 units available. Would you like to place an order?
Total iterations: 3
🔥Interview Gold: AgentExecutor vs LangGraph
Interviewers love asking why you'd choose LangGraph over the legacy AgentExecutor. The answer: LangGraph gives you an explicit, inspectable state machine where YOU control the loop. You can add human-approval nodes, conditional branching based on tool output content, parallel tool execution, and persistent checkpointing mid-run. AgentExecutor is a black box that runs until it decides it's done — you have very limited control over what happens in between.
📊 Production Insight
Cause: The legacy AgentExecutor is an opaque loop with limited hooks for customization. Effect: You cannot easily add human approval, implement complex retry logic, or inject stateful middleware without hacky callbacks. Action: Use LangGraph. Define your agent as a StateGraph. This gives you explicit control nodes, conditional edges, and the ability to inspect/modify state at any step. It's the difference between a vending machine and a kitchen.
🎯 Key Takeaway
LangGraph transforms the agent loop from a black box into a white-box state machine. The core pattern is a model_node and a tools_node with a conditional edge. Start simple, then add nodes for human-in-the-loop, error recovery, and state management. This explicit control is non-negotiable for production agents.
When to Add a Node to Your Agent Graph
IfTool performs a destructive or costly operation (delete, purchase, send).
UseAdd a human_approval_node before the tools_node. Pause and wait for confirmation.
IfTool calls frequently fail with recoverable errors.
UseAdd a retry_node after the tools_node. Check ToolMessage content for error strings and re-prompt the LLM with adjusted context.
IfAgent conversations become long and context limits are hit.
UseAdd a summarization_node. Trigger it when message count exceeds a threshold. Summarize history and replace the messages list.

Parallel Tool Calls, Streaming, and Performance at Scale

Modern models (GPT-4o, Claude 3.5, Gemini 1.5 Pro) can emit multiple tool calls in a single response. This is a huge performance win — if the model needs both the weather in London AND today's news headlines, it can request both simultaneously instead of waiting for the first result before requesting the second. LangChain's ToolNode executes parallel tool calls using asyncio.gather under the hood when running in async mode, so your async tool implementations genuinely matter here.

Streaming tool results is a less-discussed but production-critical feature. In a chat UI, users stare at a blank screen while the agent thinks and calls tools. LangChain's streaming API lets you stream both the model's reasoning tokens AND tool execution events as they happen. The .astream_events() method on a compiled LangGraph emits a stream of typed events: on_chat_model_stream for token-by-token LLM output, on_tool_start when a tool begins executing, and on_tool_end when it returns — all as async generator events your UI can consume in real time.

For production deployments, the most expensive operation in a tool-heavy agent is often not the LLM itself but the aggregate latency of sequential tool calls. Profile your agents: if tool calls are sequential when they could be parallel, restructure your prompts to encourage the model to batch its requests. If a single tool call is slow (>2s), it will dominate your p95 latency. Add caching at the tool level (as shown above) and consider prefetching common tool results at session start.

parallel_tools_and_streaming.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
import asyncio
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
import time

# --- Two slow tools that should run in PARALLEL, not sequentially ---

@tool
async def fetch_user_profile(user_id: str) -> str:
    """
    Fetch a user's profile from the user service by their ID.
    Returns name, email, and account tier. Use when personalising responses.
    """
    await asyncio.sleep(0.8)  # Simulates 800ms database query
    profiles = {
        "user_42": {"name": "Sarah Chen", "email": "sarah@example.com", "tier": "premium"},
        "user_99": {"name": "James Okafor", "email": "james@example.com", "tier": "free"},
    }
    if user_id not in profiles:
        return f"User '{user_id}' not found"
    p = profiles[user_id]
    return f"Name: {p['name']}, Email: {p['email']}, Tier: {p['tier']}"

@tool
async def fetch_recent_orders(user_id: str) -> str:
    """
    Fetch the 3 most recent orders for a user.
    Returns order IDs, items, and total amounts.
    Use when user asks about their order history or past purchases.
    """
    await asyncio.sleep(1.0)  # Simulates 1s API call to order service
    orders = {
        "user_42": [
            {"id": "ORD-8821", "item": "Wireless Keyboard", "total": 79.99},
            {"id": "ORD-7703", "item": "USB-C Hub", "total": 34.99},
        ],
        "user_99": [{"id": "ORD-9011", "item": "Ergonomic Mouse", "total": 49.99}],
    }
    if user_id not in orders:
        return f"No orders found for user '{user_id}'"
    order_list = ", ".join(
        f"{o['id']}: {o['item']} (${o['total']})" for o in orders[user_id]
    )
    return f"Recent orders: {order_list}"

async def demonstrate_parallel_execution():
    """
    With parallel tool calling, both fetch_user_profile and fetch_recent_orders
    run concurrently. Total time should be ~1s (the slower tool), not ~1.8s (sum).
    """
    parallel_tools = [fetch_user_profile, fetch_recent_orders]
    agent_llm = ChatOpenAI(model="gpt-4o", temperature=0)

    # create_react_agent is a convenience wrapper that builds the LangGraph pattern
    # we built manually above — useful for standard use cases
    react_agent = create_react_agent(agent_llm, parallel_tools)

    user_query = (
        "Give me a personalised summary for user_42. "
        "I need their profile AND their recent orders."
    )

    start_time = time.perf_counter()
    result = await react_agent.ainvoke(
        {"messages": [HumanMessage(content=user_query)]}
    )
    elapsed = time.perf_counter() - start_time

    print(f"Execution time: {elapsed:.2f}s")
    print(f"(If tools ran sequentially this would be ~1.8s — parallel saves ~0.8s)")
    print("\nFinal response:")
    print(result["messages"][-1].content)

async def demonstrate_streaming_events():
    """
    Stream agent events to show the user what's happening in real time.
    This is what you'd use to build a live 'thinking...' UI indicator.
    """
    streaming_tools = [fetch_user_profile]
    stream_llm = ChatOpenAI(model="gpt-4o", temperature=0)
    streaming_agent = create_react_agent(stream_llm, streaming_tools)

    print("\n=== STREAMING EVENTS ===")
    async for event in streaming_agent.astream_events(
        {"messages": [HumanMessage(content="Look up profile for user_42")]},
        version="v2",  # v2 is the current stable events API
    ):
        event_kind = event["event"]

        if event_kind == "on_chat_model_stream":
            # Stream LLM token output as it arrives
            token = event["data"]["chunk"].content
            if token:  # Filter empty chunks
                print(token, end="", flush=True)

        elif event_kind == "on_tool_start":
            # Notify UI that a tool is being called
            print(f"\n[UI INDICATOR] Calling tool: {event['name']}...")

        elif event_kind == "on_tool_end":
            # Tool finished — you could update a progress indicator
            print(f"[UI INDICATOR] Tool '{event['name']}' completed")

async def main():
    await demonstrate_parallel_execution()
    await demonstrate_streaming_events()

asyncio.run(main())
▶ Output
Execution time: 1.07s
(If tools ran sequentially this would be ~1.8s — parallel saves ~0.8s)

Final response:
Here's the summary for Sarah Chen (user_42):
- Account Tier: Premium
- Email: sarah@example.com
- Recent Orders: ORD-8821 (Wireless Keyboard, $79.99), ORD-7703 (USB-C Hub, $34.99)

As a premium member, Sarah is eligible for priority support and free shipping on all orders.

=== STREAMING EVENTS ===
[UI INDICATOR] Calling tool: fetch_user_profile...
[UI INDICATOR] Tool 'fetch_user_profile' completed
Based on the profile data, Sarah Chen (user_42) is a premium tier member...
💡Pro Tip: Force Parallel Calls With Prompt Engineering
Models don't always batch tool calls even when they could. Add 'When you need multiple pieces of information, request all required tools in a single response rather than sequentially' to your system prompt. In tests, this simple instruction reduces total agent latency by 30-50% on multi-tool queries by eliminating unnecessary round-trips to the LLM between each tool call.
📊 Production Insight
Cause: Sequential tool calls create an O(N) latency chain, where N is the number of tools. Effect: User-perceived latency becomes unacceptable for complex queries involving multiple data sources. Action: 1) Use prompt engineering to encourage the model to batch tool requests. 2) Ensure all tools have async _arun implementations. 3) Profile your agent to identify the slowest tool; it becomes your latency bottleneck. Cache its results agressively.
🎯 Key Takeaway
Parallel tool calls are a major performance lever, but only if your tools are async and your prompt encourages batching. Streaming is not a luxury; it's a UX requirement for interactive agents. Profile your agent's tool call sequence: the slowest single tool defines your p95 latency. Cache it or break it down.
When to Use Streaming
IfBuilding a chat UI or interactive application.
UseAlways use .astream_events(). Users need feedback during multi-second tool executions. A blank screen leads to abandonment.
IfRunning batch processing or backend jobs.
UseStreaming adds complexity with little benefit. Use .invoke() or .ainvoke() for simplicity.
IfNeed to implement custom logging or monitoring per-step.
UseUse streaming. You can capture on_tool_start and on_tool_end events to feed into your observability stack (e.g., log tool latency, success/failure).
🗂 Agent Architecture Comparison
Choosing the right loop for your production agent.
FeatureLegacy AgentExecutorLangGraph StateGraphRaw While-Loop
Control FlowOpaque, limited callbacksExplicit nodes & conditional edgesFull control, manual state management
Human-in-the-LoopDifficult, requires custom callbacksFirst-class node supportPossible but error-prone
DebuggingHard, state is hiddenEasy, state is inspectable at each nodeEasy if you add logging
Parallel Tool ExecutionLimitedBuilt-in via ToolNodeManual with asyncio.gather
Streaming SupportBasicFull event streamingManual implementation
Persistence/CheckpointingNot supportedBuilt-in with checkpointerManual implementation
Best ForSimple prototypesMost production agentsHighly custom, non-standard flows

🎯 Key Takeaways

  • Tools are a contract: name, description, args_schema. The description is the LLM's primary prompt—write it like a critical API doc.
  • Use BaseTool for production. Inject dependencies, implement both _run and _arun, and decide your error-handling strategy (return string vs. raise).
  • LangGraph > AgentExecutor. Build your agent as an explicit StateGraph for control, debuggability, and features like human-in-the-loop.
  • Parallel tool calls are a major performance win. Prompt the model to batch requests and ensure your tools are async.
  • Streaming is a UX requirement, not a nice-to-have. Use .astream_events() to give users real-time feedback.
  • Guard against failure modes: infinite loops, hallucinated tools, duplicate side-effects. Build defenses into your graph and tools.

⚠ Common Mistakes to Avoid

    Writing vague tool descriptions like 'searches for information'. The LLM will call it for everything.
    Not implementing `_arun` for I/O-bound tools, causing thread exhaustion under load.
    Letting tools raise raw exceptions, which crashes the agent instead of letting it reason about the failure.
    Using module-global variables for API keys or DB connections inside tools, breaking testability.
    Forgetting that agents can call tools multiple times, leading to duplicate side-effects (e.g., sending an email twice).
    Not setting a `max_iterations` guard, allowing the agent to loop forever on a confused state.
    Using the legacy `AgentExecutor` for production, losing access to debugging, control, and human-in-the-loop features.
    Ignoring parallel tool call support, accepting O(N) latency when O(1) is possible.
    Not caching results of slow, idempotent tool calls, paying the latency cost on every agent turn.

Interview Questions on This Topic

  • QWalk me through the lifecycle of a LangChain tool call, from the LLM's decision to the result being fed back.Reveal
    1) Tools are bound to the LLM via .bind_tools(), serializing their Pydantic schemas into JSON Schema. 2) The LLM, when it decides a tool is needed, returns an AIMessage with a tool_calls list containing the tool name and arguments. 3) The ToolExecutor or LangGraph ToolNode intercepts this, routes each call to the matching Python function, executes it, and wraps the result in a ToolMessage. 4) This ToolMessage is appended to the conversation history, and the LLM processes it in its next reasoning step. The LLM never executes code; it only declares intent.
  • QWhy would you choose LangGraph over the legacy AgentExecutor for a production agent?Reveal
    LangGraph provides an explicit, inspectable state machine. This gives me: 1) Fine-grained control over the loop with conditional edges. 2) The ability to add nodes for human approval before destructive actions. 3) Built-in support for parallel tool execution and streaming. 4) Persistent checkpointing to save and resume agent state. 5) Easier debugging, as I can inspect the state at any node. AgentExecutor is a black box that lacks these capabilities.
  • QHow do you prevent an agent from getting stuck in an infinite loop?Reveal
    I use a multi-layered approach. 1) In the graph state, I maintain an iteration_count and set a hard MAX_ITERATIONS (e.g., 10). The model node checks this and forces a stop if exceeded. 2) I write hyper-specific tool descriptions with exclusion clauses to reduce the chance of the LLM making a wrong turn. 3) I design tool outputs to be informative, giving the LLM clear success or failure signals to reason about. 4) I might add a dedicated loop-detection node that looks for repeated patterns in the message history.
  • QA tool in your agent makes a slow API call (2-3 seconds). How do you optimize the agent's performance?Reveal
    First, I profile to confirm it's the bottleneck. Then, I implement several fixes: 1) Implement the tool's _arun method with an async HTTP client to avoid blocking. 2) Add a caching layer (in-memory with TTL or Redis) to the tool itself for idempotent requests. 3) Use prompt engineering to encourage the LLM to batch this tool call with others in a single response, enabling parallel execution. 4) If possible, prefetch this data at the start of the conversation based on context.

Frequently Asked Questions

Can the LLM execute code directly when I use a tool?

No. This is a critical security boundary. The LLM only outputs a structured request (a tool_call). Your orchestration layer (LangChain) executes the actual Python function. The LLM never has direct access to your filesystem, network, or runtime.

Should I use the simple `@tool` decorator or a `BaseTool` subclass?

Use @tool for simple, stateless tools during prototyping. Use BaseTool for anything production-grade: tools that need configuration, make network calls, require error handling, or need to be tested with mocks. BaseTool gives you dependency injection and explicit async support.

How do I test a LangChain tool?

If you built it with BaseTool and dependency injection, testing is straightforward. Instantiate the tool with mock dependencies (e.g., a mock HTTP client) and call tool.invoke({...}) directly, without an LLM. This tests the pure function logic. For agent-level testing, use LangGraph's streaming to inspect the tool_calls and ToolMessages at each step.

What's the biggest mistake people make with tool descriptions?

Being vague. A description like "Searches for information" will cause the LLM to call it for everything, including queries it can't handle. A good description states the specific use case, what it returns, and explicitly what it does NOT do (e.g., "Do NOT use for historical data").

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousOpenCV BasicsNext →ONNX — Open Neural Network Exchange
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged