Senior 6 min · March 06, 2026

Celery Tasks — Why Redis LRU Silently Drops Your Jobs

Redis allkeys-lru evicts Celery tasks under memory pressure with zero errors.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Celery decouples task production from execution via a message broker
  • Workers pull tasks asynchronously — scale horizontally by adding more workers
  • Result backend stores task results; omit if you only need fire-and-forget
  • Performance: broker throughput caps at ~10k tasks/sec on a single Redis instance
  • Production pitfall: task serialization with pickle opens remote code execution — use JSON or msgpack
  • Biggest mistake: assuming tasks run exactly once — at-least-once is the default guarantee
Plain-English First

Imagine a busy restaurant. The waiter takes your order and hands a ticket to the kitchen — he doesn't stand there waiting for the food to cook. The kitchen works through tickets at its own pace, even if 50 orders pile up. Celery is that ticketing system for your Python app. Your web server hands off slow jobs — sending emails, resizing images, crunching reports — to a queue, then immediately moves on. Workers in the background chew through those jobs whenever they're ready.

Every production web app eventually hits the same wall: a request that takes too long. Maybe it's sending a welcome email, generating a PDF, calling a slow third-party API, or processing an uploaded video. If you do that work inside the HTTP request cycle, your users stare at a spinner — and under traffic, your server buckles. This isn't a Python problem. It's a distributed systems problem, and it needs a distributed systems solution.

Celery solves it by decoupling task production from task execution. Your web process serializes a task, drops it onto a message broker (Redis or RabbitMQ), and returns instantly. Separate worker processes — as many as you need, on as many machines as you want — pull tasks off the queue and execute them independently. You get horizontal scalability, fault isolation, retry logic, scheduling, and observability, all from one library that integrates cleanly with Django, FastAPI, and plain Python alike.

By the end of this article you'll understand how Celery routes messages under the hood, why task serialization choices matter for security, how to build reliable retry strategies that don't hammer downstream services, how to compose complex workflows with Canvas primitives, and which production mistakes cost teams days of debugging. This is not a hello-world walkthrough — it's the mental model you need to run Celery confidently at scale.

Celery Architecture: Broker, Workers, and Result Backend

Celery's architecture has three key parts. The broker is the message transport — Redis or RabbitMQ. Your app (the producer) serializes a task and publishes it to an exchange on the broker. Workers are separate processes that subscribe to queues and consume those messages. The result backend is optional — a database where task return values or statuses are stored for later retrieval.

When you call add.delay(2, 2), Celery does this: 1. Serializes the task name, arguments, and kwargs using the configured serializer (JSON by default). 2. Sends the message to the broker's default exchange. 3. The broker routes it to the appropriate queue based on routing key and exchange bindings. 4. An idle worker picks up the message from the queue, deserializes it, and executes the function. 5. If a result backend is configured, the return value is stored. Otherwise the result is gone after execution.

The critical point: the broker and result backend are separate concerns. You can use Redis for both, but in production it's common to use RabbitMQ for reliability and PostgreSQL for results.

app.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# io.thecodeforge.celery_app/app.py
from io.thecodeforge.celery_app import celery_app

@celery_app.task
def send_welcome_email(user_id: int) -> dict:
    # Simulated email sending
    return {"user_id": user_id, "status": "sent"}

# Producer side: result is AsyncResult
from celery.result import AsyncResult
result = send_welcome_email.delay(42)
print(result.id)  # uuid of the task
print(result.status)  # PENDING before execution
print(result.get(timeout=30))  # blocks until done

# Without result backend, get() raises an exception
Mental Model: Think of Celery as a Post Office
  • Producer writes a message (serialized task) and hands it to the post office.
  • The post office (broker) stores the message until a worker picks it up.
  • Worker reads the message, processes it, and optionally writes a receipt (result) to a central ledger.
  • Without a result backend, the receipt is torn up after delivery — you never know if it arrived.
  • Scaling: add more workers (carriers) to handle more mail, or more post offices (broker clusters) for fault tolerance.
Production Insight
Using the same Redis instance for both cache and broker is a common prod failure. Cache evictions (LRU) silently remove pending tasks.
Separate your broker Redis database (db=1) from the cache database (db=0). Use a dedicated Redis cluster for tasks if throughput > 1000 tasks/second.
Rule: the broker must never evict data — set maxmemory-policy noeviction and monitor memory usage.
Key Takeaway
The broker holds tasks in flight, the result backend tracks outcomes.
If you don't need to know the result, skip the result backend.
Never share the broker's Redis database with caching — use dedicated instances.
Choosing a Broker & Result Backend
IfLow volume, simple tasks, need fast setup
UseUse Redis as broker and Redis as result backend (single-server).
IfHigh reliability, need persistence
UseUse RabbitMQ as broker (persistent messages), PostgreSQL or RDBMS as result backend.
IfFire-and-forget tasks, no result needed
UseOmit result backend entirely. Save memory and simplify ops.
IfDistributed workers across regions
UseUse RabbitMQ with mirrored queues or Redis Sentinel for HA.

Task Definitions, Serialization, and Security

Tasks are defined by decorating functions with @app.task. The function signature, module path, and arguments are serialized and sent to workers. By default, Celery uses JSON serializer — safe and fast, but cannot handle custom Python objects, datetimes, or sets.

The serializer choice is a security boundary. The pickle serializer can deserialize arbitrary Python objects, opening a Remote Code Execution (RCE) vulnerability if an attacker can inject messages. In production, always restrict serializers to JSON or msgpack. If you need custom types, register custom encoders rather than falling back to pickle.

Task options like bind=True pass the task instance as the first argument (self), enabling access to task state, request, and retry methods. name overrides the autogenerated path-based name, useful for versioning or renaming tasks without breaking pending messages.

A subtle gotcha: when you rename the function or move it to a different module, tasks already in the queue with the old name will fail with NotRegistered. Always use an explicit name and map old names with task_names routing if doing a rolling deployment.

tasks.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# io/thecodeforge/celery_app/tasks.py
from io.thecodeforge.celery_app import celery_app

# Using JSON serializer (default) - safe for primitives
@celery_app.task(
    name="io.thecodeforge.send_report",
    serializer="json",
    autoretry_for=(ConnectionError, TimeoutError),
    max_retries=3,
)
def send_report(report_id: int, email: str) -> dict:
    # ... send report
    return {"report_id": report_id, "email": email, "status": "sent"}

# Custom serializer: handle datetime objects
from datetime import datetime
from io.thecodeforge.celery_app import celery_app

@celery_app.task(serializer="msgpack")
def schedule_backup(timestamp: datetime):
    # msgpack can handle datetime
    pass

# Explicit name to survive module moves
@celery_app.task(name="legacy.process_payment")
def process_payment(user_id: int, amount: float):
    pass
Security: Never Use Pickle in Production
If your Celery config has task_serializer = 'pickle', any attacker who can send messages to the broker (e.g., through a compromised web endpoint) can execute arbitrary code. Always set accept_content = ['json'] and task_serializer = 'json' in your Django or Flask config. Use msgpack if you need performance or special types. Let the warning be clear: pickle in production is a matter of when, not if, you'll get pwned.
Production Insight
A team ran Celery with pickle for years — until an SSRF vulnerability allowed an attacker to craft malicious task payloads. Workers became remote code execution servers.
Switch to JSON/msgpack and explicitly set accept_content to a whitelist. For legacy tasks, use the @celery_app.task(serializer='pickle') per-task override, never globally.
Rule: restrict accepted content types. If you don't, you're trusting every API client that touches the broker.
Key Takeaway
JSON is safe and fast. msgpack is faster and handles more types. Pickle is a loaded weapon.
Explicitly set accept_content and task_serializer in your Celery config.
For renamed tasks, use explicit name and old name mapping during rolling upgrades.

Routing Tasks to Queues and Workers

By default, all tasks go to a single queue named celery. That works for small setups, but as you grow, you need to separate concerns: one queue for email tasks, another for image processing, a third for report generation. This prevents a burst of image tasks from starving email delivery.

Celery routing works through the AMQP model (or simple Redis key matching). You define a Queue object with an exchange and routing key. Then assign a task's queue option, or use a task_routes config dict to route by task name pattern.

On the worker side, you specify which queues to consume with the -Q flag. A single worker can handle multiple queues, but it's common to have dedicated workers per queue for resource isolation (e.g., CPU-heavy image workers vs I/O-heavy email workers).

The classic mistake: forgetting to restart workers after adding new routing rules. Stale workers don't subscribe to new queues, tasks pile up in unbound queues, and you debug for hours before noticing.

config.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# io/thecodeforge/celery_app/config.py
from kombu import Queue, Exchange

celery_app.conf.task_queues = [
    Queue('default',    Exchange('default'),      routing_key='default'),
    Queue('emails',     Exchange('emails'),       routing_key='email.*'),
    Queue('images',     Exchange('images'),       routing_key='image.*'),
    Queue('reports',    Exchange('reports'),      routing_key='report.*'),
]

# Route by task name pattern
celery_app.conf.task_routes = {
    'io.thecodeforge.tasks.*': {'queue': 'default'},
    'io.thecodeforge.tasks.send_welcome_email': {'queue': 'emails'},
    'io.thecodeforge.tasks.resize_image': {'queue': 'images'},
    'io.thecodeforge.tasks.generate_sales_report': {'queue': 'reports'},
}

# Start a worker that only processes emails:
# celery -A io.thecodeforge.celery_app worker -Q emails -c 10
Production Insight
A misrouted task can silently sit in a queue no worker is listening to. The producer gets no error — the broker accepts it, but it's never consumed.
Always set task_routes explicitly and monitor queue lengths with a tool like Flower or a simple script. Alert if any queue exceeds expected depth.
Rule: every queue must have at least one active worker subscriber. Monitor celery inspect active_queues in your healthcheck.
Key Takeaway
Use separate queues for different task types to isolate failures and resource consumption.
Start workers with -Q queue1,queue2 to subscribe to specific queues.
Monitor queue depth — if a queue grows unbounded, no one is consuming it.

Retry Strategies: Exponential Backoff, Max Retries, and Dead Letter Patterns

Distributed systems fail. Networks drop, databases restart, third-party APIs rate-limit. Celery provides built-in retry mechanisms through the autoretry_for task option and the retry() method callable from inside a task.

autoretry_for takes a tuple of exception classes. When a task raises one of those, Celery automatically re-queues it with a delay. You control the backoff with retry_backoff, retry_backoff_max, and retry_jitter options. Default backoff is exponential: wait 1s, then 2s, 4s, 8s... up to retry_backoff_max (default 10 minutes).

For more fine-grained control, call self.retry(countdown=60) inside the task. You can pass exc to log the original exception, and max_retries to cap attempts.

A retry that succeeds on the Nth attempt does not make the previous failures visible. For critical tasks, log each retry and its exception. Consider implementing a dead letter pattern: after exhausting retries, route the task to a separate dead letter queue for manual inspection, rather than discarding it silently.

The biggest retry mistake: retrying on exceptions that won't recover (e.g., ValueError due to invalid input). You'll burn cycles and never succeed. Only retry on transient failures.

tasks_retry.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# io/thecodeforge/celery_app/tasks_retry.py
from io.thecodeforge.celery_app import celery_app
import requests

# Autoretry with exponential backoff
@celery_app.task(
    autoretry_for=(requests.ConnectionError, requests.Timeout, IOError),
    retry_backoff=True,        # 1, 2, 4, 8, ... seconds
    retry_backoff_max=300,     # max 5 minutes between retries
    retry_jitter=True,         # add randomness to avoid thundering herd
    max_retries=5,
)
def fetch_api_data(url: str) -> dict:
    resp = requests.get(url, timeout=5)
    resp.raise_for_status()
    return resp.json()

# Manual retry with dead letter logic
@celery_app.task(bind=True, max_retries=3)
def process_payment(self, user_id: int, amount: float):
    try:
        gateway_response = call_payment_gateway(user_id, amount)
        if not gateway_response['success']:
            raise ValueError(gateway_response.get('error', 'unknown'))
        return gateway_response
    except Exception as exc:
        # Log each retry
        logger.error(f"Payment failed for user {user_id}: {exc}")
        if self.request.retries >= self.max_retries:
            # Dead letter: route to a special queue
            send_to_dead_letter_queue('payments', self.request)
            return {'status': 'failed', 'reason': str(exc)}
        raise self.retry(exc=exc, countdown=2 ** self.request.retries * 5)
Production Insight
A fintech task retried 10 times on a non-recoverable validation error. Each retry hit the slow payment gateway, causing cascading timeouts across workers.
Only retry on transient exceptions. Use autoretry_for with a carefully chosen list. For non-transient failures, log to a dead letter database or alert immediately.
Rule: retry is not a fix for incorrect data — it's a strategy for unreliable networks.
Key Takeaway
Retry only transient failures. Set max_retries to avoid infinite loops.
Use exponential backoff with jitter to avoid thundering herd.
Implement a dead letter queue for tasks that exhaust retries — don't let them vanish.

Canvas: Chains, Groups, Chords, and Complex Workflows

Celery's Canvas primitives let you compose multiple tasks into workflows. A chain runs tasks sequentially, passing the result of each to the next. A group runs tasks in parallel and returns a list of results. A chord is a group followed by a callback that receives the list of group results.

Production patterns
  • fan-out: use a group to send multiple notifications simultaneously.
  • pipeline: use a chain for sequential steps like validate → enrich → persist.
  • map-reduce: use a chord: group processes chunks, callback aggregates results.

Crucial insight: the chord callback only executes after all group tasks have completed and stored their results in the result backend. If the result backend is down, the chord never fires. Similarly, if any group task fails permanently, the chord blocks forever unless you set chord_size and implement timeouts.

Another gotcha: chain tasks share the same connection — a failure in the middle can leave the chain incomplete. Wrap each link in its own error handling.

canvas_examples.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# io/thecodeforge/celery_app/canvas_examples.py
from io.thecodeforge.celery_app import celery_app
from celery import chain, group, chord

# Chain: step1 -> step2 -> step3
result_chain = chain(
    io.thecodeforge.tasks.step1.s(data=100),
    io.thecodeforge.tasks.step2.s(),
    io.thecodeforge.tasks.step3.s()
)()
print(result_chain.get())  # final result

# Group: parallel execution
parallel_results = group(
    io.thecodeforge.tasks.process_chunk.s(chunk=1),
    io.thecodeforge.tasks.process_chunk.s(chunk=2),
    io.thecodeforge.tasks.process_chunk.s(chunk=3),
)()
print(parallel_results.get())  # [res1, res2, res3]

# Chord: parallel then callback
callback = chord(
    header=[
        io.thecodeforge.tasks.compute_metric.s(user_id=1),
        io.thecodeforge.tasks.compute_metric.s(user_id=2),
    ],
    body=io.thecodeforge.tasks.aggregate_results.s()
)()
print(callback.get())  # aggregated result

# Warning: chord body requires result backend. If backend is down, chord hangs.
# Always set a chord timeout:
from celery import current_app
current_app.conf.task_soft_time_limit = 120
Chord Gotcha: Result Backend Dependency
The chord primitive stores intermediate results in the result backend. If the result backend is unavailable or has high latency, the chord will block or fail. Always ensure your result backend is highly available if you use chords. For simple parallel tasks that don't need the aggregated result, use a group and handle results in the application layer.
Production Insight
A team used a chord to generate monthly reports. One group task failed due to a misconfigured dependency — the chord blocked waiting for it for 8 hours, consuming worker memory.
Set a chord timeout: use app.conf.task_soft_time_limit for the body task. For critical workflows, implement a timeout in your application code after body.get(timeout=minutes).
Rule: chords escalate failures from one subtask to the entire workflow. Use link_error callbacks or separate error queues.
Key Takeaway
Chains for sequential, groups for parallel, chords for parallel-then-aggregate.
Chords require a reliable result backend — monitor its health.
Always set timeouts on chords and handle partial failures explicitly.

Monitoring and Observability in Production

Reading logs isn't enough. You need to see queue depth, worker heartbeats, task latency percentiles, and error rates. The Celery ecosystem provides Flower for a web UI, but for production you'll want to integrate with Prometheus/Grafana or DataDog.

Essential metrics
  • Queue depth: number of unacknowledged messages per queue. Alert if > expected baseline.
  • Worker alive count: number of active workers. If drops below N, alert.
  • Task latency: time between task creation and execution start. Spikes indicate bottleneck.
  • Task error rate: percentage of tasks raising exceptions. Unexpected errors need investigation.
  • Result backend latency: if using a database as result backend, monitor query time.

Celery also has built-in events (celery events) and a stats endpoint (celery inspect stats). You can enable task ETA monitoring to catch tasks that miss their scheduled time.

Common blind spot: prefetch count. By default, a worker prefetches up to 4 tasks (worker_prefetch_multiplier=4). If tasks are imbalanced, one worker can hog tasks while others idle. Set to 1 for fair scheduling at a slight throughput cost.

monitoring.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# io/thecodeforge/celery_app/monitoring.py
from io.thecodeforge.celery_app import celery_app
from celery.events import Events
from celery.utils import uuid

# Quick health check
from celery.task.control import inspect

def worker_count() -> int:
    i = inspect()
    active_workers = i.active()
    return len(active_workers) if active_workers else 0

def queue_depth(queue_name: str = 'celery') -> int:
    # Requires broker-specific code; example for Redis
    import redis
    r = redis.Redis(host='localhost', port=6379, db=1)
    return r.llen(queue_name)

# Export to Prometheus via custom exporter
def export_task_metrics(task_name, latency_ms, status):
    # Pseudocode: push to Prometheus client
    pass

# Celery events for real-time monitoring
@celery_app.on_after_configure.connect
def setup_events(sender, **kwargs):
    sender.conf.task_events = True
    sender.conf.task_send_sent_event = True

# In deployment, run:
# celery -A io.thecodeforge.celery_app events --camera=<camera_class> --frequency=10
Production Insight
A team lost a batch of scheduled tasks because the clock on a worker VM drifted 30 seconds ahead. The ETA-based tasks were marked as late and never executed.
Sync your worker clocks with NTP. For time-sensitive tasks, use Redis as the scheduler backend (celery beat with Redis) rather than relying on system clocks.
Rule: monitor task ETA accuracy and worker clock skew if using schedule-based tasks.
Key Takeaway
Monitor queue depth, worker count, task latency, and error rate in production.
Use Flower for development, Prometheus/Grafana for production.
Fair worker distribution: set worker_prefetch_multiplier=1 for balanced load.
● Production incidentPOST-MORTEMseverity: high

The Disappearing Task: When Celery Silently Drops Tasks

Symptom
Tasks are submitted with .delay() but never appear in the worker logs. No error, no trace. The broker shows no pending messages eventually.
Assumption
The team assumed Celery guarantees delivery as long as the broker is reachable at submission time. They didn't configure a result backend or task tracking.
Root cause
The Redis broker has a maxmemory policy set to allkeys-lru. When memory pressure hits, Redis evicts the oldest unacknowledged task messages. Celery's broker transport (redis) doesn't persist messages by default — they exist only in memory.
Fix
Set Redis maxmemory-policy to noeviction for the Celery broker database. Use a separate Redis instance or database (e.g., db=1) dedicated to Celery, away from caching. Enable task result backend with result_expires to track task lifecycle.
Key lesson
  • Never assume in-memory brokers give durability. Always pair Celery with a persistent result backend for production.
  • Redis as a broker requires careful memory management — separate Redis instances for broker and cache.
  • Add monitoring: worker heartbeat, queue depth, and task time-to-execute alerts catch silent failures.
Production debug guideQuick triage for the most common production issues5 entries
Symptom · 01
Task submitted but never runs
Fix
Check worker is alive (celery -A proj status). Verify queue name matches worker's -Q flag. Inspect broker queue length (redis-cli llen celery or rabbitmqctl list_queues).
Symptom · 02
Task runs multiple times (duplicate execution)
Fix
Check for acks_late=True combined with worker pre-fetch. Switch to acks_late=False unless idempotency is guaranteed. Enable task visibility_timeout on Redis (default 1 hour).
Symptom · 03
Worker OOM or crash after a specific task
Fix
Enable task profiling with --profiler. Set worker_max_tasks_per_child to recycle workers after N tasks. Use task_soft_time_limit and task_time_limit to kill runaway tasks.
Symptom · 04
Tasks stuck in PENDING state forever
Fix
Ensure result backend is configured and reachable. Celery async results require a running result backend; if it's down, AsyncResult blocks indefinitely. Check database connection pool for the result backend.
Symptom · 05
High latency between submission and execution
Fix
Monitor queue depth. Add more workers or increase concurrency. Check if a long-running task is blocking the prefetch — set worker_prefetch_multiplier=1 to distribute fairly.
★ Celery Quick Debug Cheat SheetCommands and checks for the 5 most common Celery production failures
Task never executes
Immediate action
Restart worker with debug logging
Commands
celery -A io.thecodeforge.celery_app worker -l debug --concurrency=1
redis-cli -n 1 LLEN celery
Fix now
Purge the queue with celery -A io.thecodeforge.celery_app purge -f and resubmit the task
Task executed multiple times+
Immediate action
Check acks_late setting
Commands
celery -A io.thecodeforge.celery_app inspect active
redis-cli -n 1 CLIENT LIST | grep -i blocked
Fix now
Set task_acks_late = False in config unless idempotent
Worker crashes with MemoryError+
Immediate action
Set resource limits and restart with max tasks per child
Commands
celery -A io.thecodeforge.celery_app control rate_limit io.thecodeforge.celery_app.tasks.* 1/m
celery -A io.thecodeforge.celery_app inspect stats | grep total
Fix now
Add worker_max_tasks_per_child = 200 and worker_max_memory_per_child = 50000
AsyncResult.get() hangs forever+
Immediate action
Check if result backend is reachable
Commands
celery -A io.thecodeforge.celery_app inspect registered
psql -h dbhost -U user -c 'SELECT count(*) FROM celery_taskmeta'
Fix now
Set result_backend = 'db+sqlite:///results.db' for local testing to isolate the issue
Task timing out but not killed+
Immediate action
Verify soft and hard time limits
Commands
celery -A io.thecodeforge.celery_app inspect query
celery -A io.thecodeforge.celery_app status
Fix now
Set task_soft_time_limit = 300 (5 minutes) and task_time_limit = 600 in app config
Broker Comparison: Redis vs RabbitMQ for Celery
FeatureRedisRabbitMQ
Persistence (durable messages)Only if AOF/RDB enabled; memory-firstFully persistent with confirmed publishes
Throughput (default config)~10k tasks/sec~20k tasks/sec
High AvailabilityRedis Sentinel or ClusterMirrored queues, quorum queues
Latency (p99)~1ms~2ms
Ease of setupVery simple, single binaryRequires config, management UI
Security (RCE risk)Pickle still possibleSame risk if pickle enabled
Result backend usageNative support, fastNot recommended; use separate backend

Key takeaways

1
Celery decouples task production from execution via a broker
workers scale independently.
2
Choose the right broker
Redis for simplicity, RabbitMQ for reliability.
3
Never use pickle in production
it's a remote code execution vector.
4
Use separate queues for different task types to isolate failures.
5
Retry only transient exceptions; implement dead letter queues for permanent failures.
6
Monitor queue depth, worker count, and task latency
not just logs.
7
Canvas workflows (chains, groups, chords) are powerful but depend on a healthy result backend.

Common mistakes to avoid

5 patterns
×

Using pickle serializer in production

Symptom
A security audit reveals that workers can execute arbitrary Python code. An SSRF vulnerability in a web endpoint allows attackers to inject malicious task payloads.
Fix
Set task_serializer = 'json' and accept_content = ['json'] in your Celery app config. Migrate existing tasks to JSON or msgpack; use custom encoders for complex types.
×

Not configuring a result backend but then calling .get() on results

Symptom
Task returns successfully, but AsyncResult.get() raises OperationalError or hangs forever.
Fix
Either configure a result backend (e.g., Redis, database) or don't call .get(). For fire-and-forget tasks, omit the result backend entirely.
×

Ignoring worker prefetch multiplier for unbalanced tasks

Symptom
Some workers sit idle while others have 10+ tasks queued. Overall throughput is lower than expected.
Fix
Set worker_prefetch_multiplier = 1 to distribute tasks more evenly, especially if tasks vary in duration. Monitor with celery inspect active.
×

Retrying on non-transient exceptions (e.g., invalid input data)

Symptom
A task fails 5 times with the same ValueError before exhausting retries. Logs show repeated identical failures.
Fix
Only use autoretry_for with transient exceptions (network timeouts, database disconnects). For validation errors, log and fail fast. Implement dead letter queue for persistent failures.
×

Not setting timeouts on chords, causing workflows to block forever

Symptom
A chord with 10 subtasks hangs because one subtask never completes. The callback never fires, and the entire pipeline stalls.
Fix
Set task_soft_time_limit and task_time_limit on the chord body task. Implement external timeout in your application code for the chord result. Use link_error to handle subtask failures.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does Celery guarantee at-least-once delivery, and what scenarios can...
Q02SENIOR
Explain how Celery routing works with AMQP and how you would set up sepa...
Q03SENIOR
What are the security implications of using pickle as a serializer in Ce...
Q04SENIOR
Describe the difference between a chain, a group, and a chord in Celery ...
Q05SENIOR
Your Celery tasks are piling up in the queue but workers show zero memor...
Q01 of 05SENIOR

How does Celery guarantee at-least-once delivery, and what scenarios can cause duplicate execution?

ANSWER
By default, Celery acknowledges a task after execution (acks_late=False). If the worker crashes after acknowledgement but before the result is stored, the task is lost. With acks_late=True, the task is acknowledged after execution, so a crash during execution causes the task to be redelivered to another worker. This creates duplicates. Celery never guarantees exactly-once delivery; you must implement idempotency in your tasks. Duplicates can also occur if the broker (e.g., Redis) redelivers due to timeout or if you manually re-queue a task.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is Celery for Task Queues in Python in simple terms?
02
When should I use Celery vs just using Python threads or asyncio?
03
Can I use Celery with Django or FastAPI?
04
How do I monitor Celery in production?
05
How do I handle long-running tasks without blocking other tasks?
🔥

That's Python Libraries. Mark it forged?

6 min read · try the examples if you haven't

Previous
FastAPI Basics
17 / 51 · Python Libraries
Next
Pytest Fixtures