Skip to content
Home DevOps Introduction to AutoSys

Introduction to AutoSys

Where developers are forged. · Structured learning · Free forever.
📍 Part of: AutoSys → Topic 1 of 30
AutoSys is Broadcom's enterprise job scheduler used by banks, telecoms, and Fortune 500s to automate batch workloads.
🧑‍💻 Beginner-friendly — no prior DevOps experience needed
In this tutorial, you'll learn
AutoSys is Broadcom's enterprise job scheduler used by banks, telecoms, and Fortune 500s to automate batch workloads.
  • AutoSys is an enterprise workload automation platform for scheduling, monitoring, and managing batch jobs across multiple servers.
  • It solves the cross-server dependency and centralised monitoring problem that cron simply can't handle at scale.
  • Jobs are defined using JIL (Job Information Language), a scripting language specific to AutoSys.
AutoSys Workflow — From Job Definition to Execution Flow diagram showing AutoSys job lifecycle: Define job in JIL → Event Server stores it → Event Processor evaluates → Remote Agent executes → Status reported back. THECODEFORGE.IOWhat AutoSys DoesFrom job definition to execution Define job in JILscript, schedule, machine, conditions Event Server stores itpersistent database of all definitions Event Processor evaluateschecks conditions every cycle Remote Agent executesruns command on target machine Status reported backSUCCESS / FAILURE / TERMINATEDTHECODEFORGE.IO
thecodeforge.io
AutoSys Workflow — From Job Definition to Execution
Introduction Autosys
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Enterprise workload automation for scheduling, dependency, and monitoring
  • Jobs defined via JIL (Job Information Language) — scripts, executables, DB calls
  • Centralised control across hundreds of servers from a single dashboard
  • Job chains: Job B runs only after Job A succeeds; retry and alert logic built-in
  • max_run_alarm prevents hung jobs from blocking downstream for hours
  • Biggest mistake: treating it like cron — AutoSys has its own lifecycle and state machine
🚨 START HERE

AutoSys Job Failure Quick Debug Cheat Sheet

Five commands that diagnose 90% of production AutoSys issues
🟡

Job status unknown or not running as expected

Immediate ActionCheck job definition and status
Commands
autorep -q JOB_NAME
autorep -j JOB_NAME -l020
Fix NowIf job is ON ICE, use sendevent -E FORCE_STARTJOB -J JOB_NAME to override ice temporarily.
🟡

Job failed with exit code but no log

Immediate ActionGet full event history
Commands
autorep -j JOB_NAME -l020 | tail -50
autorep -j JOB_NAME -d | grep -i error
Fix NowCheck stdout/stderr on agent machine in the job's working directory.
🟡

Job stuck in RUNNING

Immediate ActionKill the hung process
Commands
sendevent -E JOB_ON_ICE -J JOB_NAME
sendevent -E KILLJOB -J JOB_NAME
Fix NowIf KILLJOB doesn't work, SSH to agent and kill the PID found in autorep -j output.
🟡

Dependent job never starts even after upstream succeeds

Immediate ActionVerify the condition is met
Commands
autorep -j UPSTREAM_JOB -q | grep status
autorep -j DOWNSTREAM_JOB -l030 | tail -20
Fix NowIf condition uses 's' but the upstream finished with non-zero exit code, the condition will never trigger. Change to 'o' (completion regardless of exit code) or add retry.
Production Incident

The Silent Pipeline Failure: A Hung Job Blocks Overnight Batch

A daily end-of-day settlement job ran fine for years. Then one day it never completed. No alert fired. By morning, the entire downstream chain — regulatory reports, reconciliation, file transfers — was stalled. The team only noticed when the morning shift logged in.
SymptomBatch pipeline frozen at 3:17 AM. All downstream jobs in 'PENDING' status. The upstream job shows 'RUNNING' status but no CPU or I/O activity on the agent machine.
AssumptionThe job was a simple database procedure call. Since it ran successfully thousands of times, the team assumed it would never hang.
Root causeNo max_run_alarm configured on the job. The stored procedure entered an infinite loop due to a data anomaly (unexpected NULL in a column used in the loop condition). AutoSys never terminated it because the job process was still alive — just not progressing.
FixAdd max_run_alarm: 60 (minutes) to the JIL definition and configure terminate_on_max_run: yes. Set an alert on the alarm to page the on-call engineer. Also added a guard in the stored procedure to detect and exit on NULL values.
Key Lesson
Always set max_run_alarm on every CMD job that calls external processes — even 'simple' database calls can hang.A job in RUNNING status doesn't mean it's making progress. Monitor CPU and database activity separately.max_run_alarm without terminate_on_max_run just warns — it doesn't fix the problem.
Production Debug Guide

Symptom → Action patterns for the most common production issues

Job shows RUNNING but no activity on agent machineCheck autorep -q JOB_NAME to see current status. Then autorep -j JOB_NAME -l020 to get last run log. Look for process ID (PID) and verify if it's still alive on the agent. If the process exited but AutoSys didn't detect it, the agent may need a restart.
Job fails with status TERMINATED immediately after startRun autorep -j JOB_NAME -l020. Common causes: missing environment variables, incorrect working directory, or insufficient permissions. Check the agent machine's syslog for segfaults or permission denials.
Box job stuck in ACTIVATED/STARTING status for hoursA box job needs at least one child job inside to start. If the box has no children or all children are ON ICE, the box will never start. Run autorep -q BOX_NAME to list children and check their statuses. If children are missing, re-insert them.
Daily job runs on wrong day or not at allCheck date_conditions and days_of_week in the JIL. Verify the calendar (if any) using autorep -q CAL_NAME. Is the job ON ICE or ON HOLD? Use sendevent to view current hold/ice status.

If you've ever worked in an enterprise IT environment — banking, insurance, telecom, retail — you've probably heard someone say 'the AutoSys job failed at 2am.' AutoSys is the tool that runs the world's batch processing. It's been doing this since CA Technologies (now Broadcom) released it in the 1990s, and it's still running mission-critical ETL pipelines, payroll runs, and report generation at thousands of companies today.

The reason AutoSys stuck around isn't nostalgia. It's because it solves a real problem that simple cron jobs can't: running complex workflows where Job B depends on Job A, Job A might fail and need a retry, and you need a centralised dashboard to see what's happening across 200 servers at once.

What is AutoSys and what does it actually do

AutoSys is a workload automation platform. At its core it does three things: scheduling (run this job at 3am every weekday), dependency management (run this job only after that job succeeds), and monitoring (alert me if anything takes longer than expected or fails).

A 'job' in AutoSys can be any executable — a shell script, a Python script, a Java program, a database procedure call, or even just a system command. AutoSys doesn't care what the job does; it just controls when it runs and what happens next.

simple_jil_example.jil · BASH
123456789
/* A basic AutoSys job definition */
insert_job: daily_report
job_type: CMD
command: /opt/scripts/generate_report.sh
machine: prod-server-01
owner: svcaccount
days_of_week: mo,tu,we,th,fr
start_times: "06:00"
description: "Generates daily sales report"
▶ Output
/* Job inserted successfully into AutoSys database */
🔥AutoSys is now owned by Broadcom
AutoSys was originally a CA Technologies product. Broadcom acquired CA in 2018. The product is now officially called Broadcom AutoSys Workload Automation, but most teams still just call it AutoSys.
📊 Production Insight
If you define a job without machine or owner, the insert will fail silently with a syntax error.
Always validate JIL syntax with jil command before deploying to production.
Rule: treat JIL like code — put it under version control.
🎯 Key Takeaway
AutoSys is a scheduler, executer, and monitor rolled into one.
JIL is the language you use to tell it what to run and when.
The simplest job needs: name, type, command, machine, and time.

Why enterprises use AutoSys instead of cron

Cron is great for simple, single-server scheduling. But AutoSys was built for a different scale. When you have hundreds of interdependent jobs running across dozens of servers, cron's limitations become painful fast.

AutoSys gives you: centralised control across all servers from one place, job dependency chains (job C only runs if job A and B both succeeded), a GUI to visualise job flows, automatic retry logic, alerting when jobs take too long or fail, audit trails for compliance, and the ability to put jobs on hold or ice without deleting them. Banks running end-of-day settlement processes can't afford to manage 500 cron entries across 30 servers manually.

📊 Production Insight
Using cron across servers means you lose centralised visibility — a hung job on one server can block everything but you won't see it until the next report fails.
AutoSys's event server captures every state change; you can replay an entire night's history in seconds.
Rule: if your batch pipeline spans more than 3 servers, it's time to drop cron.
🎯 Key Takeaway
AutoSys centralises scheduling, dependency, and monitoring across machines.
Cron is per-server, per-script — it breaks at enterprise scale.
The killer feature: dependency conditions with status checks (success, failure, completion).

Who uses AutoSys in the real world

AutoSys is heavily used in industries that run large batch workloads on tight schedules: banking and financial services (end-of-day processing, regulatory reporting), insurance (claims processing, premium calculations), telecoms (billing runs, CDR processing), retail (inventory reconciliation, overnight pricing updates), and healthcare (claims adjudication, HL7 batch feeds).

If you're going for a role as a batch developer, ETL developer, production support engineer, or middleware/integration developer at any large enterprise, there's a solid chance AutoSys is in the stack.

📊 Production Insight
In banking, a single failed batch job can delay regulatory filings by a day — that's a compliance fail.
AutoSys jobs often run on dedicated batch servers that don't have production monitoring agents — you need AutoSys-specific alerting.
Rule: always add max_run_alarm and email notification to every production job.
🎯 Key Takeaway
AutoSys dominates in industries where batch reliability is business-critical.
If you work in enterprise IT, you'll likely encounter AutoSys.
Learn JIL basics — they transfer across all AutoSys deployments.

AutoSys Job Lifecycle and Key Concepts

An AutoSys job goes through a defined lifecycle: INITIAL → STARTING → RUNNING → SUCCESS (or TERMINATED). You can also place a job in ON ICE (permanently inactive) or ON HOLD (inactive until its condition is met, then it runs automatically when the condition clears).

Key concepts
  • status: current state of the job
  • condition: expression that controls when a job starts based on upstream job statuses
  • start_times: wall-clock time triggers
  • max_run_alarm: maximum allowed runtime before an alarm fires
  • box: a container job that groups jobs together for scheduling and visibility

Jobs are defined using JIL and stored in the AutoSys Event Server database.

box_jil_example.jil · BASH
123456789101112131415161718192021
/* Box job containing two child jobs with dependency */
insert_job: end_of_day_box
job_type: BOX

insert_job: settlement
job_type: CMD
box_name: end_of_day_box
command: /app/scripts/run_settlement.sh
machine: batch-srv-01
owner: batchuser
start_times: "02:00"

insert_job: reconciliation
job_type: CMD
box_name: end_of_day_box
command: /app/scripts/run_recon.sh
machine: batch-srv-02
owner: batchuser
condition: success(settlement)
start_times: "02:30"
▶ Output
/* Box job 'end_of_day_box' inserted with two children */
Mental Model
Box jobs as folders
Think of a box job like a folder that holds files: the folder itself doesn't run, but it controls the visibility and lifecycle of everything inside it.
  • A box's status aggregates child statuses — if any child fails, the box shows FAILURE.
  • You can start/stop a box, and it cascades to all children.
  • Boxes can be nested, allowing hierarchical grouping of complex workflows.
📊 Production Insight
Box jobs are often misused as 'dummy parents' — if you put a condition on the box itself, children may never start.
A box job with no children will never leave STARTING status.
Rule: children should have their own conditions; box-level conditions are for advanced orchestration only.
🎯 Key Takeaway
ON ICE vs ON HOLD: ON ICE ignores conditions and never runs until activated; ON HOLD waits for condition and runs automatically when it clears.
max_run_alarm is your first line of defence against hung jobs — always set it.
Box jobs are organisational — don't overuse them for simple workflows.
When to use a Box job vs individual CMD jobs
IfJobs are independent but belong to the same logical workflow
UseGroup them in a box for visibility and manual control.
IfJobs share the same schedule and retry settings
UsePut them in a box and set box-level attributes (like days_of_week).
IfYou need to start a batch of jobs together but not based on any dependency
UseA box job is perfect — start the box and all children start on their own start_times.
IfJobs are completely independent and don't need grouping
UseUse individual CMD jobs. No box needed.

Common Job Statuses and What They Mean in Production

AutoSys jobs report one of about 12 statuses. The ones you'll encounter most:

  • INITIAL (IN): Job exists but hasn't been activated yet. Usually means it's waiting for its schedule or condition.
  • STARTING (ST): Job is being dispatched to the agent machine.
  • RUNNING (RU): Job is executing on the agent. This is where most hangs occur.
  • SUCCESS (SU): Job completed with exit code 0.
  • FAILURE (FA): Job completed with non-zero exit code.
  • TERMINATED (TE): Job was forcibly killed (by user or max_run_alarm).
  • ON ICE (OI): Job is permanently inactive — won't run even if conditions are met.
  • ON HOLD (OH): Job is temporarily inactive; it becomes active when its condition is satisfied.
  • RESTART (RR): Job was restarted manually or via retry.
  • ACTIVATED (AC): Box job is active and ready to run children.
  • PENDING (PE): Job is queued but waiting for an agent machine to be available.

Knowing the status tells you exactly where to look next.

⚠ Don't mistake ON HOLD for ON ICE
ON HOLD defers execution until the condition is true — once the condition is met, it runs immediately. ON ICE never runs unless someone explicitly sends FORCE_STARTJOB. A common production mistake is using ON HOLD when you meant ON ICE, causing jobs to fire unexpectedly at 3am.
📊 Production Insight
A job stuck in STARTING for more than a few minutes usually means the agent machine is unreachable or the AutoSys agent is down.
A job that flips between RUNNING and TERMINATED repeatedly is probably being killed by an external watchdog.
Rule: use autorep -j JOB_NAME -l020 to see the last run's exit code and log; use -l030 for the full history.
🎯 Key Takeaway
Status + log = 90% of diagnosis.
S (SUCCESS) means exit code 0; FA (FAILURE) means non-zero.
ON ICE and ON HOLD behave differently — learn the distinction.
FeatureAutoSysCronWindows Task Scheduler
Job dependenciesYes — full condition logicNoLimited
Cross-server schedulingYes — centralisedNo — per serverNo — per machine
GUI / dashboardYes — WCC GUINoYes — basic
Retry on failureYes — configurableNo — manualLimited
Audit trailYes — full event logNoLimited
Alert on overrunYes — max_run_alarmNoNo
Hold/suspend jobsYes — ON HOLD / ON ICENo — must deleteLimited

🎯 Key Takeaways

  • AutoSys is an enterprise workload automation platform for scheduling, monitoring, and managing batch jobs across multiple servers.
  • It solves the cross-server dependency and centralised monitoring problem that cron simply can't handle at scale.
  • Jobs are defined using JIL (Job Information Language), a scripting language specific to AutoSys.
  • It's heavily used in banking, insurance, telecoms, and other industries that run large overnight batch workloads.
  • AutoSys is now owned by Broadcom and actively developed as of 2026.
  • Always set max_run_alarm. Always test with the service account. Never confuse ON HOLD and ON ICE.

⚠ Common Mistakes to Avoid

    Treating AutoSys like cron
    Symptom

    Jobs fail or behave unexpectedly because you're using cron's mental model — like not respecting the job lifecycle or forgetting that conditions are evaluated by the event server, not the agent.

    Fix

    Learn AutoSys's state machine and JIL syntax. Realise that AutoSys jobs persist in the database and can be scheduled without being in a file. You don't 'edit a crontab' — you update JIL definitions.

    Forgetting that jobs run under a service account
    Symptom

    Permission denied errors on file reads, database connections, or script execution on first run. The job works when you test it manually (as yourself) but fails in AutoSys.

    Fix

    Always test your job command with the exact service account. Use sudo -u svcaccount /path/to/command in a test environment. Ensure the account has the required permissions on the agent machine and network.

    Not setting max_run_alarm
    Symptom

    A job that hangs (e.g., infinite loop in a script) runs forever. Downstream jobs are blocked waiting for it. No alert fires until morning when someone notices the pipeline is frozen.

    Fix

    Add max_run_alarm to every CMD job. Set a reasonable value based on historical run times plus a buffer. Consider terminate_on_max_run: yes for critical jobs.

    Confusing ON HOLD and ON ICE
    Symptom

    A job placed ON HOLD runs automatically when its condition is met, causing an unexpected execution at 3am. Or a job placed ON ICE never runs despite all conditions being satisfied.

    Fix

    Memorise: ON HOLD = defer until condition clears; ON ICE = block permanently until forced. To keep a job from running indefinitely, use ON ICE. To delay it until a specific event, use ON HOLD.

    Not using conditions correctly with status characters
    Symptom

    Condition success(jobA) fails to trigger because jobA exited with non-zero but you only care about completion. Or condition st(jobA).st(jobB) causes both to run before they're both ready.

    Fix

    Learn the condition syntax: 's' = success, 'f' = failure, 'o' = completion (any exit), 'e' = non-zero exit. Use o when you don't care about exit code. Combine with and/or operators.

Interview Questions on This Topic

  • QWhat is AutoSys and what problems does it solve that cron cannot?JuniorReveal
    AutoSys is an enterprise workload automation platform. It solves cross-server dependency management, centralised monitoring, retry logic, audit trails, and complex scheduling that cron cannot handle at scale. Cron is per-server, has no dependency model, no central dashboard, and no built-in alerting.
  • QName the three types of jobs in AutoSys.JuniorReveal
    CMD (any command or script), BOX (container for grouping jobs), and File Watcher (monitors a file for arrival/changes and triggers a downstream job). File Watcher is less common but exists.
  • QWhat is JIL and how is it used to define jobs?Mid-levelReveal
    JIL stands for Job Information Language. It's a simple, declarative scripting language specific to AutoSys. You use JIL commands like insert_job, update_job, delete_job to define or modify job attributes. JIL is submitted through the jil command-line tool or loaded from a file.
  • QIn what industries is AutoSys most commonly used and why?JuniorReveal
    Banking, insurance, telecom, retail, healthcare — any industry with large overnight batch workloads that require precise sequencing, retry logic, and compliance auditing. Examples: end-of-day settlement, regulatory reports, billing runs, inventory updates.
  • QWhat is the Event Server and what does it store?Mid-levelReveal
    The Event Server (or AutoSys Database) is a relational database that stores all job definitions, scheduled events, run history, and statuses. It's the central brain of AutoSys — agents poll the event server to know what to run. If the database is down, no jobs start.
  • QWhat is the difference between ON HOLD and ON ICE?SeniorReveal
    ON HOLD: job is inactive but will automatically start once its condition is satisfied. ON ICE: job is permanently inactive; it won't run even if conditions are met. To run an iced job, you must send FORCE_STARTJOB. This is a common cause of production incidents when engineers confuse the two.

Frequently Asked Questions

Is AutoSys still relevant in 2026?

Yes. While newer tools like Apache Airflow are popular in data engineering, AutoSys remains dominant in traditional enterprise environments — especially banking, insurance, and telecom — where it manages mission-critical batch workloads that have been running reliably for years.

Do I need to know Linux to use AutoSys?

A working knowledge of Linux/Unix commands helps significantly since most AutoSys jobs run shell scripts. But the AutoSys Web UI and WCC (Workload Control Center) can be used without deep Linux knowledge for monitoring and job management.

What is the difference between CA AutoSys and Broadcom AutoSys?

They're the same product. CA Technologies developed AutoSys, and Broadcom acquired CA in 2018. The product continues under Broadcom as AutoSys Workload Automation. Version numbers and feature names have stayed largely consistent.

Can AutoSys run Python scripts?

Yes. Any executable that can be run from the command line can be a CMD job in AutoSys — Python scripts, shell scripts, Java jars, database commands, and more. AutoSys simply calls the command on the target machine.

How do I debug a job that fails with exit code 1?

First run autorep -j JOB_NAME -l020 to see the last run's stdout/stderr. If that doesn't help, check the agent machine's log at /tmp/... (auto user's home)/logs. If the job uses environment variables, verify they are defined in the AutoSys profile or JIL.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

Next →AutoSys Architecture and Components
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged