Everything you need to master CA AutoSys Workload Automation — from JIL syntax to production incident recovery. Written for engineers who actually run these systems at scale.
AutoSys (officially CA Workload Automation AE, owned by Broadcom) is an enterprise job scheduler built for complex workloads across distributed systems — batch jobs, file watchers, multi-step workflows, and event-driven pipelines. It's the backbone of nightly batch processes in investment banks, insurance companies, telcos, and healthcare — the infrastructure behind payroll runs, bank settlements, ETL pipelines, and regulatory reporting at 2am. If you've done enterprise application support and something broke in the night, there's a fair chance AutoSys was somewhere in the chain.
In most enterprises it's the central platform for workload automation — the thing that coordinates hundreds of jobs across dozens of servers and makes sure they run in the right order, on the right machine, at the right time. You'll also encounter it referred to as Workload Automation AE, and in older shops as the predecessor to what Broadcom now calls the Universal Automation Center. When it works, nobody notices. When it breaks, everything stops.
What separates AutoSys from cron is that jobs are event-driven, not just time-driven. A job doesn't fire because the clock hit 22:00 — it runs when a set of conditions are all true: another job succeeded, a file landed in a directory, a global variable was set by an upstream process, or an external system sent a trigger. That event management capability is what makes AutoSys the coordination layer for complex workflows where ordering, cross-server dependencies, and integration between systems all matter.
| Feature | Cron | AutoSys | Control-M | Automic | Apache Airflow |
|---|---|---|---|---|---|
| Job dependencies | None | Full condition logic | Full condition logic | Full condition logic | DAG-based dependencies |
| Cross-server jobs | Per-server only | Remote agents | Remote agents | Remote agents | Celery / Kubernetes executors |
| File watchers | No | Native FW job type | Native | Native | Sensors (FileSensor) |
| Alerting | Manual setup | Built-in alarms | Built-in | Built-in | Callbacks + PagerDuty hooks |
| Restart/recovery | Manual | sendevent commands | GUI + commands | GUI + commands | Task retry + backfill |
| Audit trail | Logs only | Full event history in DB | Full event history | Full event history | Metadata DB + UI logs |
| Licensing | Free / OS | Commercial (Broadcom) | Commercial (BMC) | Commercial (Broadcom) | Open-source (Apache 2.0) |
| Pipeline definition | crontab | JIL | GUI / XML | GUI / scripts | Python DAGs |
| Typical use | Simple scheduled tasks | Enterprise batch pipelines | Enterprise batch pipelines | Enterprise automation / SAP workloads | Data engineering / ML pipelines |
The comparison table above shows the surface differences. The real question enterprises face in 2025–2026 is whether to keep AutoSys for existing batch pipelines or migrate to Airflow for new workloads — or run both. Here's the honest answer based on what's actually happening in production environments.
| Dimension | AutoSys | Apache Airflow | When it matters |
|---|---|---|---|
| Job definition | JIL — declarative text, version-controlled, environment-portable | Python DAGs — full programming language, more expressive but harder to audit | Compliance-heavy shops prefer JIL's audit trail. Data engineering teams prefer Python DAGs. |
| Cross-server execution | Native — agents on any OS, no extra config | Requires Celery/K8s executor setup and worker infrastructure | Legacy mixed-OS environments (Linux + Windows + AIX) — AutoSys wins cleanly |
| File-based triggers | Native FW job type — no code required | FileSensor — requires Python, polling config, and operator knowledge | File-arrival pipelines in banking/insurance — AutoSys is simpler to operate |
| Failure recovery | sendevent FORCE_STARTJOB with -D date flag — ops team can recover without dev involvement | Clear/re-run from UI or CLI — requires understanding DAG state | 24/7 ops teams without Python skills — AutoSys recoveries are faster to execute |
| ML/data pipelines | Possible via CMD jobs calling Python — awkward | Native — PythonOperator, SparkSubmitOperator, dbt integration | New ML workloads should go on Airflow. Retrofitting AutoSys is painful. |
| Scalability | EAS is a single-process bottleneck — high job counts require careful tuning | Horizontally scalable with Celery/K8s workers | 10,000+ concurrent tasks — Airflow scales better |
| Vendor dependency | Broadcom licensing — price increases post-VMware acquisition have accelerated migration interest | Apache 2.0 — no licensing cost | Cost-reduction initiatives — Airflow migration is increasingly common |
| Operational complexity | Low for ops teams — JIL is approachable, CLI tools are consistent | Higher — requires Python knowledge across the ops team | Shops without dedicated platform engineering — AutoSys is easier to hand off |
CA Workload Automation AE has three main components: the Event Server (EAS), Remote Agents, and the AutoSys database. Understanding how they interact is essential for diagnosing failures and designing reliable pipelines. Together they give you end-to-end visibility of workloads running across your entire infrastructure — from on-premises Linux servers to cloud agents.
The Event Server (EAS) is the scheduler's brain — it polls the database for new events, evaluates which jobs are ready to run based on their conditions, and dispatches them to agents. If the EAS process dies, nothing runs. Jobs don't error out; they just stop. This is why the first thing you check during any "why isn't this running?" investigation is whether EAS is actually alive: ps -ef | grep EAS.
Agents are stateless daemons on target machines. They wait for dispatch instructions from the EAS, execute the job command under the configured owner account, and write the exit code back to the database. That's it. They don't have opinions about what they're running — if the command path is wrong or the owner doesn't exist, the job silently hangs in STARTING. The agent has no way to tell the EAS "I tried but couldn't start this."
Symptom: Jobs targeting a specific machine are stuck in STARTING state for 10+ minutes, then fail with "connection refused."
Root cause: The Remote Agent daemon on the target machine crashed after an OS patch restart. The Event Server couldn't reach it.
Fix: SSH to the target machine, check agent status with ps -ef | grep cybAgent, restart with service cybagent restart, then force-start the affected jobs.
Prevention: Add agent health monitoring to your infrastructure alerting. A dead agent is silent — it doesn't alert AutoSys directly.
JIL (Job Information Language) is the scripting language used to define every object in AutoSys — jobs, boxes, file watchers, global variables, machine registrations. Every batch job, every file watcher, every scheduling rule in the system is defined using Job Information Language. It looks like a config file format, not a programming language, but don't underestimate it. A single missing attribute or a wrong machine alias can cause a job to silently never run. Learn to read JIL fluently; learn to write it carefully.
/* BOX — container for the entire EOD pipeline */ insert_job: BOX_PAYMENT_EOD job_type: BOX owner: svc_autosys permission: gx,ge,wx,we,mx,me date_conditions: 1 days_of_week: mo,tu,we,th,fr start_times: "22:00" timezone: GMT description: "End-of-day payment settlement pipeline" alarm_if_fail: 1 alarm_if_terminated: 1
/* Step 1: Extract — no condition, runs when box starts */ insert_job: JOB_EXTRACT_TXN job_type: CMD box_name: BOX_PAYMENT_EOD command: /opt/etl/extract_transactions.sh machine: db-extract-01 owner: svc_autosys std_out_file: /logs/autosys/extract_txn.out std_err_file: /logs/autosys/extract_txn.err max_run_alarm: 60 /* Step 2: Transform — depends on extract success */ insert_job: JOB_TRANSFORM_TXN job_type: CMD box_name: BOX_PAYMENT_EOD command: /opt/etl/transform.sh machine: etl-proc-01 owner: svc_autosys condition: success(JOB_EXTRACT_TXN) max_run_alarm: 45 /* Step 3: Load — depends on transform, alerts on failure */ insert_job: JOB_LOAD_TXN job_type: CMD box_name: BOX_PAYMENT_EOD command: /opt/etl/load_to_dw.sh machine: dw-loader-01 owner: svc_autosys condition: success(JOB_TRANSFORM_TXN) alarm_if_fail: 1 max_run_alarm: 30 n_retrys: 2 retry_interval: 5
/* File watcher — triggers when file arrives */ insert_job: FW_PAYMENT_FILE job_type: FW box_name: BOX_PAYMENT_EOD watch_file: /data/incoming/payment_*.csv watch_interval: 60 owner: svc_autosys machine: file-drop-01 min_file_size: 1 /* Process job runs only after file arrives */ insert_job: JOB_PROCESS_PAYMENT job_type: CMD box_name: BOX_PAYMENT_EOD command: /opt/payment/process.sh machine: proc-01 condition: success(FW_PAYMENT_FILE)
Every AutoSys job cycles through a series of status codes. Knowing what each one means — and what caused it — is the core skill for on-call AutoSys support.
| Status | Meaning | Common Cause |
|---|---|---|
| INACTIVE | Job exists but has never run in this cycle | Normal initial state |
| ACTIVATED | Box started, job is waiting for its conditions | Normal — waiting for dependencies |
| STARTING | Event Server dispatching to remote agent | Normal, should be brief (<30s) |
| RUNNING | Executing on remote agent | Normal during execution |
| SUCCESS | Completed with exit code 0 | Normal completion |
| FAILURE | Completed with non-zero exit code | Script error, bad data, permissions |
| TERMINATED | Killed via sendevent KILLJOB | Manual intervention or max_run_alarm |
| ON_HOLD | Held by operator — will not run | Manual hold, maintenance window |
| ON_ICE | Frozen — invisible to scheduler | Manual skip, downstream proceeds |
| RESTART | Waiting to be restarted after failure | n_retrys configured, retry pending |
When something breaks in production, sendevent is how you fix it. It's AutoSys's event management interface — every manual intervention, every status change, every pipeline recovery goes through this command. Learn these cold. You will not have time to look them up at 3am with a box stuck, an SLA breaching, and stakeholders pinging you.
Scenario: JOB_LOAD_TXN failed at 23:45 due to a tablespace issue. The DBA fixed the tablespace and manually inserted the data directly. The EOD box is stuck waiting for the load job to succeed.
Wrong fix: sendevent -E FORCE_STARTJOB -J JOB_LOAD_TXN — this re-runs the load script, which will attempt to insert the data again and create duplicates.
Correct fix: sendevent -E CHANGE_STATUS -J JOB_LOAD_TXN -s SUCCESS — marks the job as succeeded without running it, allowing downstream jobs to proceed. No duplicate data.
autorep is your read-only window into the AutoSys database — job status, run history, definitions, machines, calendars, global variables. It's the primary monitoring tool for workload automation on the command line, giving you full visibility into what's running, what failed, and what's scheduled next. You can't break anything with it. Run it freely, run it often.
# Show current status of a specific job autorep -J JOB_LOAD_TXN -s # Show all jobs in a box with their current status autorep -J BOX_PAYMENT_EOD -s # Show job definition (JIL attributes) autorep -J JOB_LOAD_TXN -d # Show run history for a job (last 10 runs) autorep -J JOB_LOAD_TXN -s -t # Show all FAILURE jobs in the last 24 hours autorep -J % -s | grep FAILURE # Show all jobs on a specific machine autorep -M db-extract-01 -s # Show all currently RUNNING jobs autorep -J % -s | grep RUNNING # Show jobs that ran between specific times autorep -J BOX_PAYMENT_EOD -s -t -S 2200 -E 2359 # Export job definition back to JIL format autorep -J BOX_PAYMENT_EOD -q
% wildcard matches all jobs. Pipe to grep to filter by status. Add | wc -l to get a count.Most engineers learn AutoSys through the command line, which is fine — the CLI is faster for day-to-day support. But the AutoSys Web UI (formerly Workload Control Center / CA WCC) is worth knowing because it's what business stakeholders, junior operators, and managers use. It provides real-time visibility across all your workloads — job status, alerts, dependencies, overdue jobs — from a single browser interface. In some shops it's also how job definitions get submitted via Quick Edit. If you're supporting a team, you need to know both.
The AutoSys Web UI connects to one or more Event Server instances and gives you a browser-based view of the same data autorep shows on the command line. The underlying database is identical — Web UI doesn't have its own state.
| Panel | What It Does | CLI Equivalent |
|---|---|---|
| Monitoring | Live view of job status across instances. Spot overdue jobs, set up alert subscriptions, filter by box, machine, or status. | autorep -J % -s |
| Quick View | Single-job deep dive — definition, conditions, run history, logs, and flow diagram. Send events directly from here without touching the CLI. | autorep -J job -f -d |
| Quick Edit | Edit a job definition directly from the AutoSys Web UI — change attributes, conditions, or scheduling without writing JIL manually. | jil (update_job) |
| Application Editor | Visual dependency graph — shows which jobs depend on which, conditions, and successor chains. Invaluable for impact analysis before changes. | job_depends -J box |
| Enterprise Command Line | Browser-based terminal. Run jil, autorep, sendevent on the connected server without SSH access. | Direct CLI |
| Forecast | Shows upcoming scheduled runs for a job or box. Use it to spot overdue jobs before they become incidents and validate calendar logic before go-live. | forecast -J box -t 30 |
| Resources | Visual resource utilisation — how many units are consumed vs available right now. | autorep -r RESOURCE -s |
| Credentials | Manage per-server authentication credentials. Add/update/validate the owner account passwords that agents use to execute jobs. | N/A (admin only) |
autorep output.autorep, sendevent, and jil through the browser — same commands, same results, no terminal required.The fields in Web UI map directly to JIL attributes — there's no separate "Web UI format." When you save a job definition in Web UI, it writes JIL to the database exactly the same as jil < script.jil would. You can define a job in Web UI and export it as JIL with autorep -J JOB_NAME -q.
One gotcha: Web UI may not expose every JIL attribute through its forms — particularly custom or advanced attributes. For anything beyond standard CMD/BOX/FW job types, use JIL directly. Don't assume every attribute is accessible through the GUI.
These are the failure patterns that repeat across every enterprise AutoSys environment. Batch processes fail for predictable reasons — and once you've seen each of them once, you diagnose them in minutes instead of hours. Not theory — this is what actually breaks at 2am and costs operational efficiency when it does.
Nine times out of ten, the box has a condition attribute of its own — referencing a job outside the box that hasn't completed yet. Run autorep -J BOX_NAME -d and look at the box's own condition line, not just its children. If you're in the AutoSys Web UI, Quick View on the box shows the same information with a visual flow diagram that makes cross-box dependencies immediately obvious. The other case is a child job added mid-cycle that never ran. Both look identical from the monitoring view.
Start with autoping -m MACHINE_NAME. If it times out, the agent is dead — restart it and move on. If autoping succeeds, the agent is alive but still not executing. At that point, check whether the owner account exists on the target machine (ssh machine id svc_autosys). A missing owner is the most common cause of a persistent STARTING hang that survives an agent restart. If the owner exists, check that it has execute permission on the command path.
Check global variables first — autorep -G %. If a condition includes value(GVAR_NAME) != "" and the variable is empty or stale, the job will sit in ACTIVATED indefinitely with no error. After that, check the Event Server itself: ps -ef | grep EAS. A live EAS process that can't reach the database queues everything internally without dispatching. Check /opt/CA/WorkloadAutomationAE/SystemAgent/logs/ for connection errors.
Symptom: End-of-month jobs didn't run. No failures, just no execution. Jobs show INACTIVE.
Root cause: The box had run_calendar: BUSINESS_DAYS and the last day of the month fell on a Saturday. The calendar had no entry for that date, so the box never activated.
Fix: Updated the calendar definition to include the last business day of each month using extended_calendar attributes. Manually force-started the box for the missed date using sendevent -E FORCE_STARTJOB -J BOX_EOM_REPORTS -D 20260330.
Lesson: Always run autorep -c CALENDAR_NAME -t against any new calendar and manually count the expected dates before it goes live. Month-end calendars are wrong more often than you'd expect.
Global variables are key-value pairs stored in the AutoSys database, readable and writable by any job on any machine at runtime. They're how you pass data between jobs in a pipeline — the run date, an input file path, a record count — without hardcoding values or relying on temp files on a shared filesystem.
/* Define a global variable in JIL */ insert_global: GVAR_RUN_DATE value: "20260417" insert_global: GVAR_PAYMENT_FILE value: "" /* Reference a global variable in a job command */ insert_job: JOB_EXTRACT_TXN job_type: CMD command: /opt/etl/extract.sh $$GVAR_RUN_DATE machine: db-extract-01 /* Use global variable in a condition */ insert_job: JOB_LOAD_TXN job_type: CMD command: /opt/etl/load.sh $$GVAR_PAYMENT_FILE condition: success(JOB_EXTRACT_TXN) AND value(GVAR_PAYMENT_FILE) != ""
# Set a global variable from within a running job script sendevent -E SET_GLOBAL -G GVAR_PAYMENT_FILE -V "/data/incoming/payment_20260417.csv" # Set from command line (operator intervention) sendevent -E SET_GLOBAL -G GVAR_RUN_DATE -V "20260417" # Read current value autorep -G GVAR_RUN_DATE # List all global variables autorep -G %
value(GVAR_NAME) syntax. This lets you gate job execution on data state, not just job completion state — a powerful pattern for file-driven pipelines where you need to verify a file path was actually set before attempting to process it.Problem: Global variables persist across box runs. If last night's run set GVAR_PAYMENT_FILE to a stale path and tonight's file watcher failed silently, the load job may process yesterday's file again.
Fix: Always reset critical global variables to empty at the start of each box run using a dedicated reset job as the first step, before any file watchers or extract jobs run.
AutoSys Workload Automation security gets ignored until something goes wrong — a developer accidentally force-starts a production settlement run, or an audit finds stale accounts with Admin access that nobody revoked. The security model has two layers that are worth understanding separately: JIL permissions (per-job, defined in the job definition) and EEM (centralized role-based access, mapped to your LDAP/AD groups).
Every job definition includes a permission attribute that controls access at the job level. Permissions are set as a comma-separated list of access codes.
insert_job: JOB_PAYMENT_EOD job_type: CMD owner: svc_autosys permission: gx,ge,wx,we,mx,me /* Permission codes: g = group, w = world, m = me (owner) x = execute (run/force-start) e = edit (modify JIL definition) r = read (view job definition) gx,ge = group can execute and edit wx,we = world can execute and edit mx,me = owner can execute and edit Production best practice: gx,ge only Never wx or we in production — too permissive */
EEM is the modern RBAC layer. It allows administrators to define roles (operator, developer, admin, read-only) and map them to LDAP/AD groups. This means access is managed centrally via your directory service rather than per-job JIL attributes.
| Role | Typical Permissions | Use Case |
|---|---|---|
| Read-Only | View job status and history only | Business analysts, audit |
| Operator | Force-start, hold, ice, change status | On-call support engineers |
| Developer | Insert/update job definitions in dev/test | Application developers |
| Admin | Full access including delete and security changes | AutoSys team only |
Running a single Event Server is a calculated risk that most teams accept in dev and test. In production, it's a problem. When the EAS goes down, scheduling stops completely — no failover, no queuing to another server, nothing. You lose all visibility into running workloads and no new jobs dispatch until it recovers. AutoSys addresses this with a Primary/Shadow configuration: two Event Servers sharing the same database, with the Shadow ready to take over if the Primary disappears.
The Primary is the active scheduler. The Shadow runs in parallel, reading the same database, staying in sync. It watches the Primary's heartbeat — a regular signal written to the database. When the heartbeat stops, the Shadow waits a configurable timeout, then promotes itself to Primary and resumes dispatching. In practice, failover takes 30–60 seconds. Jobs already running on agents are unaffected — they complete independently and write results back to the database when done.
autoping -m EAS_MACHINE_NAME — include this in your monitoring stackSymptom: Jobs ran twice. Duplicate records in the data warehouse.
Root cause: Network partition between Primary and Shadow. Shadow couldn't see Primary's heartbeat, promoted itself. Network recovered — now both servers thought they were Primary and dispatched the same jobs.
Fix: Immediately shut down one EAS instance. Identify duplicated records via run history timestamps. Implement a fencing mechanism (STONITH or database-based lock) to prevent dual-active scenarios.
Prevention: Use a dedicated heartbeat network interface separate from the data network. Configure the Shadow with a longer promotion timeout to survive brief network blips.
AutoSys isn't just shell scripts on bare-metal Linux anymore. As a workload automation platform, it needs to integrate with the modern enterprise stack — REST APIs, containers, cloud infrastructure. Modern pipelines call REST APIs mid-run, spin up containers for compute-heavy steps, and dispatch jobs to agents running in cloud VPCs. The integration tooling to support this has been in AutoSys r12.x for a while — it's just less well-documented than the core JIL features.
The Web Services job type (job_type: WS) allows AutoSys to call REST or SOAP endpoints directly as a job — no wrapper script needed. The job succeeds or fails based on the HTTP response code.
/* Web Services job — calls a REST endpoint */ insert_job: JOB_TRIGGER_RISK_ENGINE job_type: WS box_name: BOX_PAYMENT_EOD web_svc_url: https://risk-engine.internal/api/v2/run-eod web_svc_method: POST web_svc_body: {"run_date":"$$GVAR_RUN_DATE","mode":"full"} web_svc_success_codes: 200,201,202 condition: success(JOB_LOAD_TXN) max_run_alarm: 15
AutoSys supports running Docker containers as jobs via the DOCKER job type or via CMD jobs that invoke docker run or kubectl. The container job type manages the container lifecycle — pull, run, capture exit code, clean up.
/* CMD job wrapping a Docker container run */ insert_job: JOB_RISK_CALC_CONTAINER job_type: CMD command: docker run --rm \ -e RUN_DATE=$$GVAR_RUN_DATE \ -v /data/risk:/data \ registry.internal/risk-calculator:2.1.4 \ --mode full-eod machine: docker-host-01 owner: svc_autosys max_run_alarm: 45 /* Kubernetes job via kubectl */ insert_job: JOB_RISK_CALC_K8S job_type: CMD command: kubectl create job risk-calc-$$GVAR_RUN_DATE \ --from=cronjob/risk-calculator \ --namespace=batch-jobs machine: k8s-bastion-01 max_run_alarm: 60
AutoSys cloud agents run on AWS EC2, Azure VMs, or GCP instances and register back to the on-premises Event Server. From AutoSys's perspective they're just another machine attribute — the job definition is identical whether it runs on-prem or in cloud. This makes AutoSys a genuine hybrid workload automation platform, enabling seamless integration between on-premises batch processes and cloud-based transformation, ML workloads, and modern data pipelines.
The patterns below cover the two most common cloud integration use cases: triggering AWS Glue / Lambda from an AutoSys CMD job via the AWS CLI, and dispatching Azure Batch tasks via the az CLI. Both use an agent running in the target cloud account — no firewall holes needed for job payloads, only the agent's outbound registration to EAS.
/* Trigger AWS Glue ETL job from AutoSys — agent runs on EC2 with IAM role */ insert_job: JOB_AWS_GLUE_ETL job_type: CMD machine: aws-agent-prod-01 /* EC2 instance with AutoSys agent */ owner: svc_autosys command: aws glue start-job-run \ --job-name etl-payment-settlements \ --arguments '{"--run_date":"$$GVAR_RUN_DATE"}' \ --region eu-west-1 \ --query 'JobRunId' --output text condition: success(JOB_EXTRACT_TXN) max_run_alarm: 30 box_name: BOX_PAYMENT_EOD /* Invoke AWS Lambda synchronously and gate on success */ insert_job: JOB_AWS_LAMBDA_VALIDATE job_type: CMD machine: aws-agent-prod-01 owner: svc_autosys command: aws lambda invoke \ --function-name validate-settlement-data \ --payload '{"date":"$$GVAR_RUN_DATE"}' \ --cli-binary-format raw-in-base64-out \ /tmp/lambda_response.json \ && grep -q '"statusCode":200' /tmp/lambda_response.json condition: success(JOB_AWS_GLUE_ETL) max_run_alarm: 10 box_name: BOX_PAYMENT_EOD
/* Submit Azure Batch task from AutoSys — agent runs on Azure VM with Managed Identity */ insert_job: JOB_AZURE_BATCH_RISK job_type: CMD machine: azure-agent-prod-01 /* Azure VM with AutoSys agent */ owner: svc_autosys command: az batch task create \ --account-name batchprodaccount \ --job-id risk-calc-job \ --task-id risk-$$GVAR_RUN_DATE \ --command-line "/bin/bash -c 'python /scripts/risk_calc.py --date $$GVAR_RUN_DATE'" \ && az batch task show \ --account-name batchprodaccount \ --job-id risk-calc-job \ --task-id risk-$$GVAR_RUN_DATE \ --query 'executionInfo.exitCode' | grep -q '^0$' condition: success(JOB_LOAD_TXN) max_run_alarm: 60 box_name: BOX_PAYMENT_EOD
command attributes — they end up in the AutoSys database in plaintext. The correct pattern is to assign an IAM role (AWS) or Managed Identity (Azure) directly to the EC2 / Azure VM running the AutoSys agent. The CLI tools (aws, az) automatically use instance credentials. For Kubernetes workloads, the pattern of creating a job from a CronJob template works well — use kubectl wait --for=condition=complete job/risk-calc-DATE --timeout=3600s in a subsequent CMD job to gate downstream AutoSys jobs on Kubernetes job completion.Broadcom's AutoSys R26 roadmap introduces AI-native job types and AI-assisted operations as first-class platform features — a significant shift for workload automation. The headline capability is AI job remediation: when a job fails, the platform can analyse the failure pattern, match it against historical incidents, and suggest or automatically apply a recovery action (restart with modified parameters, skip and proceed, or page on-call with a pre-populated runbook). For shops running hundreds of nightly batch jobs, this reduces mean-time-to-recovery on common failure classes without requiring an on-call engineer to diagnose from scratch.
Two additional capabilities worth tracking for R26+: AI job type (dispatching workloads to LLM-backed services as a native job, not a wrapper script), and MCP-based orchestration that allows AutoSys to act as a workflow coordinator for multi-agent AI pipelines — where individual steps are AI model invocations rather than shell commands. These features are in active development as of R26 and will be most relevant for enterprises building ML pipelines on top of existing AutoSys infrastructure. If you're evaluating AutoSys for a new installation today, confirm whether your licensed version includes these capabilities before designing AI-integrated workflows around them.
Job Information Language defines your jobs. The jil command is what loads them into the database. There are three ways to run it — and one of them (the syntax validator) is the one most people skip until they've had a bad day.
# Method 1: Redirect a JIL script file (most common in production) jil < payment_eod.jil # Method 2: Interactive mode — type JIL statements directly, Ctrl+D to commit jil # jil> insert_job: JOB_TEST job_type: CMD # jil> command: /opt/test.sh # jil> machine: proc-01 # jil> ^D (Ctrl+D commits to database) # Method 3: Validate syntax WITHOUT committing to database # Always run this before applying in production jil -syntax < payment_eod.jil # If valid: no output and exit code 0 # If invalid: error message with line number # Update an existing job definition update_job: JOB_EXTRACT_TXN max_run_alarm: 90 /* change one attribute, rest unchanged */ # Delete a job delete_job: JOB_OLD_EXTRACT # Delete a box AND all jobs inside it delete_box: BOX_OLD_PIPELINE # Delete a global variable delete_glob: GVAR_DEPRECATED_FLAG # Register a machine in AutoSys topology insert_machine: new-etl-server-01 max_load: 10 factor: 1.00 opsys: LINUX description: "New ETL processing server"
jil -syntax < script.jil before applying to production. The syntax checker validates the entire script without touching the database. A single syntax error in a 200-job JIL file will abort the entire import — leaving you with a partially applied definition. Validate first, always.update_job only changes the attributes you specify. All other attributes remain exactly as they were. This is the safe way to change a single attribute on a live job without risk of accidentally resetting other settings. Use insert_job only when creating a new job or when you deliberately want to reset all attributes.Symptom: Only 40 of 60 jobs in a JIL script were created. The other 20 were missing with no error in the AutoSys logs.
Root cause: A syntax error on line 180 caused the jil command to abort mid-import. Jobs defined after line 180 were never loaded.
Fix: Ran jil -syntax < script.jil, found the error (a missing colon in a condition attribute), fixed it, re-ran the full import. Jobs already created by the partial import needed to be manually deleted first.
Prevention: Always validate with -syntax before importing. In CI/CD pipelines, add jil -syntax as a mandatory gate before any JIL deployment.
AutoSys virtual machines have nothing to do with hypervisors. They're logical pools of real agent machines — you point a job at the pool and AutoSys picks which agent to actually run it on. If you're running dozens of batch jobs that can execute on any of several identical servers, virtual machines let you distribute that load automatically instead of hardcoding machine names into every job definition.
/* Define a virtual machine containing 3 real agents */ insert_machine: VM_ETL_POOL type: v /* v = virtual machine */ machine_method: ROUNDROBIN /* distribute jobs evenly */ real_machines: etl-proc-01 etl-proc-02 etl-proc-03 description: "ETL processing pool — 3 agents" /* Job targets the virtual machine — AutoSys picks the real agent */ insert_job: JOB_PROCESS_BATCH job_type: CMD command: /opt/etl/process.sh machine: VM_ETL_POOL /* targets pool, not specific machine */ max_run_alarm: 30
| Method | How It Works | Best For |
|---|---|---|
| ROUNDROBIN | Jobs distributed sequentially across all available agents | Uniform job sizes, simple pools |
| CPU_MON | Job sent to agent with lowest current CPU usage | Mixed workloads with variable CPU demand |
| JOB_LOAD | Uses job_load and max_load attributes to track theoretical load | Jobs with known resource weights |
rstatd daemon running on all target machines — if it's not running, AutoSys falls back to CPU_MON silently, which may not distribute as expected. Confirm rstatd status before using CPU_MON in production.The condition attribute has more options than most people use. The basics — success(JOB) and AND/OR combinations — cover 80% of pipelines. But the other 20% is where things get interesting: jobs that should only run if something is not running, conditions that check freshness with lookback windows, and exit codes that aren't as binary as they look.
/* done() — runs if job is in ANY completed state (SUCCESS, FAILURE, TERMINATED) Use when you want to proceed regardless of how the upstream job ended */ condition: done(JOB_OPTIONAL_CLEANUP) /* notrunning() — runs only if the specified job is NOT currently executing Use to prevent two jobs running on the same resource simultaneously */ condition: notrunning(JOB_DB_MAINTENANCE) /* failure() with lookback — trigger if upstream failed recently Useful for alerting jobs that should only fire on fresh failures */ condition: failure(JOB_PAYMENT_FEED, 01.00) /* Lookback using colon syntax — must escape colon with backslash Both formats are valid: 01.30 and 01\:30 */ condition: success(JOB_RISK_ENGINE, 01\:30) /* Combination — complex real-world condition */ condition: success(JOB_EXTRACT, 02.00) AND notrunning(JOB_DB_BACKUP) AND value(GVAR_MARKET_OPEN) = "Y"
By default AutoSys marks a job FAILURE if it exits with any non-zero code. The max_exit_success attribute lets you define a threshold — any exit code up to and including that value is treated as SUCCESS. Critical for scripts that use exit codes to signal warnings rather than failures.
insert_job: JOB_DATA_VALIDATION job_type: CMD command: /opt/validate/run_checks.sh machine: etl-proc-01 max_exit_success: 4 /* Exit codes 0-4 treated as SUCCESS Exit code 0 = all checks passed Exit codes 1-4 = warnings (some checks failed but acceptable) Exit code 5+ = FAILURE (critical errors) */
Without resource controls, AutoSys Workload Automation dispatches jobs as fast as conditions allow. That's fine until four load jobs hit the same Oracle database simultaneously and everything grinds to a halt. Resources are named concurrency counters — you define how many units exist, jobs declare how many they consume, and AutoSys queues the rest until capacity frees up.
/* Define a resource — max 3 concurrent DB connections */ insert_resource: DB_CONNECTIONS quantity: 3 description: "Max concurrent Oracle DW connections" /* Jobs that consume this resource — each consumes 1 unit */ insert_job: JOB_LOAD_PAYMENTS job_type: CMD command: /opt/etl/load_payments.sh machine: etl-proc-01 resources: DB_CONNECTIONS /* consumes 1 unit */ insert_job: JOB_LOAD_TRADES job_type: CMD command: /opt/etl/load_trades.sh machine: etl-proc-02 resources: DB_CONNECTIONS /* waits if 3 already running */ /* A heavy job consuming multiple units */ insert_job: JOB_BULK_LOAD job_type: CMD command: /opt/etl/bulk_load.sh machine: etl-proc-01 resources: DB_CONNECTIONS(2) /* consumes 2 units — counts as 2 connections */
autorep -r DB_CONNECTIONS -s to check current resource utilisation.Most teams handle file transfers with wrapper shell scripts that call sftp or ftp — and then spend time debugging script quoting issues, missing error handling, and logs that don't tell you what actually failed. The FTP job type (job_type: FTP) replaces all of that. AutoSys Workload Automation manages the connection, authentication, transfer, and exit code natively.
/* FTP job — download file from remote server */ insert_job: JOB_FTP_GET_PAYMENT job_type: FTP box_name: BOX_PAYMENT_EOD ftp_machine: sftp.partner-bank.com ftp_user: svc_transfer ftp_password: %%ENCRYPTED_PASSWORD%% ftp_src_file: /outgoing/payment_$$GVAR_RUN_DATE.csv ftp_dest_file: /data/incoming/payment_$$GVAR_RUN_DATE.csv ftp_dest_dir: /data/incoming machine: file-drop-01 /* agent that performs the transfer */ description: "Download daily payment file from partner bank" max_run_alarm: 15 /* FTP job — upload results to remote server */ insert_job: JOB_FTP_PUT_REPORT job_type: FTP box_name: BOX_PAYMENT_EOD ftp_machine: reporting.internal ftp_user: svc_reports ftp_src_file: /data/reports/eod_$$GVAR_RUN_DATE.csv ftp_dest_dir: /reports/incoming machine: file-drop-01 condition: success(JOB_GENERATE_REPORT)
%%ENCRYPTED_PASSWORD%% pattern uses the AutoSys credential vault.If you've ever had a month-end job silently not run because nobody thought about what happens when the last business day falls on a weekend — you needed calendars. Calendars are named sets of dates that jobs and boxes reference for scheduling. Instead of hardcoding days_of_week and hoping it covers edge cases, you define the dates centrally and reference them across as many jobs as needed.
/* Standard calendar — specific dates the job SHOULD run */ insert_calendar: CAL_TRADING_DAYS_2026 datetimes: 01/02/2026 01/05/2026 01/06/2026 01/07/2026 01/08/2026 01/09/2026 /* add all trading days */ description: "NYSE trading days 2026" /* Extended calendar — calculates dates by rule last_business_day: runs on last business day of each month */ insert_calendar: CAL_MONTH_END type: extended definition: "last_business_day" description: "Last business day of each month" /* Exception calendar — dates the job should NOT run Use with run_calendar to exclude holidays */ insert_calendar: CAL_HOLIDAYS_2026 datetimes: 01/01/2026 05/25/2026 07/04/2026 12/25/2026 /* Apply calendar to a box job */ insert_job: BOX_TRADING_EOD job_type: BOX run_calendar: CAL_TRADING_DAYS_2026 exclude_calendar: CAL_HOLIDAYS_2026 start_times: "18:00"
# List all dates in a calendar — critical for month-end debugging autorep -c CAL_TRADING_DAYS_2026 -t # List all defined calendars autorep -c % # Check next scheduled run dates for a box autorep -J BOX_TRADING_EOD -d | grep -i calendar # Forecast when a job will next run (shows upcoming scheduled dates) forecast -J BOX_TRADING_EOD -t 30
Symptom: Month-end reporting jobs didn't run in March. No failures — jobs show INACTIVE the entire day.
Root cause: The calendar CAL_BUSINESS_DAYS_2026 was built from a template that didn't account for March 31 being a Tuesday (valid business day). A data entry error left it off the datetimes list.
Fix: autorep -c CAL_BUSINESS_DAYS_2026 -t confirmed March 31 was missing. Added it with update_calendar JIL, then force-started the box with sendevent -E FORCE_STARTJOB -J BOX_EOM_REPORTS -D 20260331.
Prevention: After building any annual calendar, run autorep -c CALENDAR_NAME -t and manually verify the count of dates matches expected business days for the year.
Rename a job. Delete a job. Move a job between boxes. Any of those changes can silently break conditions in jobs that reference the old name — and you won't know until those downstream jobs sit in ACTIVATED forever with no error, no alarm, no indication that anything is wrong. In AutoSys Workload Automation, a condition referencing a non-existent job evaluates silently to false. job_depends is the command that catches this before it catches you.
# Check if all condition references in a job are valid # Reports any job names referenced in conditions that don't exist in AutoSys job_depends -J JOB_LOAD_TXN # Check an entire box and all its children job_depends -J BOX_PAYMENT_EOD # Check all jobs in the system — run this after any large JIL import job_depends -J % # Example output when a dependency is broken: # JOB_LOAD_TXN: condition job JOB_TRANSFORM_V2_TXN not found in database # This means JOB_LOAD_TXN will never start — its condition always false
job_depends -J % after every JIL deployment. When jobs are renamed, deleted, or migrated between environments, condition references can become dangling pointers. job_depends catches all of them in one pass. A job with a broken condition will silently never run — there is no error, no alarm, just an ACTIVATED job that waits forever.Symptom: After a JIL refactor renaming JOB_EXTRACT to JOB_EXTRACT_TXN, 12 downstream jobs stopped running. They showed ACTIVATED indefinitely.
Root cause: The 12 jobs had condition: success(JOB_EXTRACT). The old name no longer existed. Conditions silently evaluated to false.
Fix: job_depends -J % immediately identified all 12 broken references. Updated all conditions to reference JOB_EXTRACT_TXN.
Prevention: Make job_depends -J % part of your deployment runbook. Run it after every JIL change that renames or deletes jobs.
AutoSys version numbers matter in production. The version your organisation runs determines which JIL attributes are available, how security is enforced, and whether features like containerised agents or AI job types are in scope. This section tracks the key changes across recent major releases — useful when upgrading, auditing a legacy environment, or comparing capabilities with a prospective employer's stack.
| Release | Key Changes | Impact |
|---|---|---|
| r12.x | Baseline for most legacy enterprise installs. EEM introduced for RBAC. Oracle / MSSQL backend. Web UI (WAAE). JIL-based job management. WS and FTP job types available. | Still running in many banks and telcos. Most JIL patterns in this guide apply fully. |
| r11.3 / r11.3.6 | Widely deployed version in financial services. Stable but approaching end of support in many regions. | Check Broadcom support lifecycle before planning new deployments on this version. |
| r21 | Modernised Web UI. Improved REST API surface. Enhanced container job support. Updated agent communication protocols. | First version where container and REST job types are production-grade for most use cases. |
| r24.1 | Secure Agent Communication (HTTPS) — agents register and communicate over TLS by default. Enhanced Security hardening across EEM integration. UI modernisation continued. Improved telemetry controls. | If your organisation has strict data-in-transit requirements, r24.1 is the minimum version to target for new deployments. Existing r12.x environments upgrading to r24.1 need to plan agent certificate rollout. |
| r26 (roadmap) | AI job remediation — failure pattern matching with automated recovery suggestions. AI job type — LLM-backed service invocation as a native job construct. MCP orchestration — AutoSys as coordinator for multi-agent AI workflows. Further UI and API modernisation. | Not yet GA at time of writing. Evaluate for environments running ML pipelines or looking to reduce on-call toil from batch failure diagnosis. |
The most common upgrade path in enterprise shops is moving off r12.x toward r21 or r24.1. The key compatibility considerations:
# 1. Audit existing JIL for deprecated attributes before upgrade autorep -J % -q | grep -E 'deprecated_attr|old_param' # 2. Export all job definitions to JIL files for backup and diff autorep -J % -q > all_jobs_backup.jil # 3. Validate exported JIL against new version syntax checker # (run on a test instance running the target version) jil -syntax < all_jobs_backup.jil # 4. Check EEM policy compatibility — role definitions may need migration # Review EEM role mappings before cutover # 5. Plan agent certificate rollout for r24.1 HTTPS comms # Each remote agent needs a signed cert or trust of the EAS CA autoping -m all_agents # Verify all agents reachable before upgrade
sendevent, autorep, jil) are consistent across all of these — they're just marketing names.The patterns below are production-tested templates. Copy, rename the job and box identifiers, and adjust the machine and owner attributes for your environment. Every template includes the attributes most commonly omitted in first drafts — the ones that cause 3am pages.
/* Production CMD job — minimum viable attributes for a reliable batch job */ insert_job: JOB_YOUR_NAME job_type: CMD box_name: BOX_YOUR_BOX machine: your-agent-hostname owner: svc_autosys /* service account, never a personal account */ command: /opt/scripts/your_script.sh $$GVAR_RUN_DATE std_out_file: /logs/autosys/JOB_YOUR_NAME_$$DATE.out std_err_file: /logs/autosys/JOB_YOUR_NAME_$$DATE.err alarm_if_fail: 1 max_run_alarm: 30 /* alert if running more than 30 mins */ condition: success(JOB_UPSTREAM) timezone: GMT /* always set explicitly — never rely on agent default */ description: "What this job does and who owns it — ops@yourcompany.com"
/* File Watcher with size and age guards — prevents triggering on empty or stale files */ insert_job: FW_SETTLEMENT_FILE job_type: FW box_name: BOX_SETTLEMENT_EOD machine: file-agent-prod-01 owner: svc_autosys watch_file: /data/incoming/settlement_$$DATE.csv min_file_size: 1 /* reject empty files */ file_watch_interval: 60 /* check every 60 seconds */ watch_interval: 60 alarm_if_fail: 1 max_run_alarm: 480 /* alert if file doesn't arrive within 8 hours */ timezone: GMT
/* Calendar-driven BOX — month-end safe pattern */ insert_job: BOX_MONTH_END_CLOSE job_type: BOX owner: svc_autosys run_calendar: LAST_BUSINESS_DAY /* named calendar, not days_of_week */ start_times: "20:00" timezone: GMT alarm_if_fail: 1 box_failure: 1 /* box goes FAILURE if any child fails */ description: "Month-end close pipeline — finance-ops@yourcompany.com" /* Child job pattern: always reference the box, never schedule children independently */ insert_job: JOB_EXTRACT_GL job_type: CMD box_name: BOX_MONTH_END_CLOSE machine: batch-agent-prod-01 owner: svc_autosys command: /opt/scripts/extract_gl.sh $$GVAR_RUN_DATE std_out_file: /logs/autosys/JOB_EXTRACT_GL_$$DATE.out std_err_file: /logs/autosys/JOB_EXTRACT_GL_$$DATE.err alarm_if_fail: 1 max_run_alarm: 45
When an AutoSys job fails in production, work through this checklist before touching anything. Acting before diagnosing is the fastest path to a longer outage.
| # | Check | Command | What to look for |
|---|---|---|---|
| 1 | What is the current job status? | autorep -J JOB_NAME -s | Confirm FAILURE vs TERMINATED vs INACTIVE — they mean different things |
| 2 | What was the exit code? | autorep -J JOB_NAME -d | Exit code 127 = command not found. 126 = permission denied. 1 = script error. 0 = success |
| 3 | Did the job actually start on the agent? | Check std_err_file on the agent machine | Empty err file = job never started. Non-empty = script ran but failed |
| 4 | Is the agent reachable? | autoping -m MACHINE_NAME | If agent is unreachable, job will hang in STARTING indefinitely |
| 5 | Are upstream dependencies satisfied? | autorep -J UPSTREAM_JOB -s | Job may be waiting for a condition that will never be true this run |
| 6 | Is this a timing issue? | autorep -J JOB_NAME -d | grep run_window | Job may have missed its run window and won't start until next cycle |
| 7 | Has this failed before? | Check job history in WCC or autorep -J JOB_NAME -L | Pattern of failures at same time = environmental issue, not code bug |
| 8 | What changed recently? | Check jil audit log in AutoSys database | Job definition changes, machine changes, calendar updates |
| 9 | Is the box in a bad state? | autorep -J BOX_NAME -s | RUNNING box with child in FAILURE won't auto-recover — may need FORCE_STARTJOB on box |
| 10 | Is this a global variable issue? | autorep -G GVAR_NAME | Empty or wrong value will cause condition-dependent jobs to never start |
sendevent, run autorep -J JOB_NAME -d and read the full job definition output. Nine out of ten production incidents are caused by a condition attribute, a wrong machine name, or a missing global variable that takes 30 seconds to spot in the definition — but 90 minutes to diagnose by guessing.These come up repeatedly in AutoSys and workload automation interviews at investment banks, telcos, and enterprise tech shops. They cover scheduling logic, automation patterns, and production recovery — and they test whether you've actually been on-call, actually debugged a stuck pipeline, actually had to decide between FORCE_STARTJOB and CHANGE_STATUS under pressure. Knowing the theory is table stakes. The answers here go further.
autorep -J BOX_NAME -d to inspect the box's own condition. Also check if any child job is in a state other than SUCCESS — autorep sometimes truncates output. If the box condition references an external job, check that job's status and either fix it or force its status to SUCCESS. If all conditions are genuinely met, try sendevent -E FORCE_STARTJOB -J BOX_NAME to re-evaluate.condition: success(FAILED_JOB) will automatically queue once the re-run succeeds.n_retrys tells AutoSys to automatically restart a failed job up to N times before marking it as permanently FAILURE. Combined with retry_interval (minutes between retries), it handles transient failures like network timeouts. The key limitation: retries only apply to FAILURE exit codes, not to TERMINATED status (jobs killed via KILLJOB). Also, if the job fails during a box run and the box completes (because other jobs don't depend on this one), retries stop — the box ending resets all job states.autoping -m MACHINE_NAME — if it fails, the agent is unreachable. Fix the agent first (restart cybagent service). Step 2: If autoping succeeds, run autorep -M MACHINE_NAME -r to confirm the machine alias is registered correctly in AutoSys topology. Step 3: SSH to the target machine and verify the owner account exists: id svc_autosys. If the user doesn't exist, the job stays in STARTING indefinitely with no error. Step 4: Check the owner has execute permission on the script path. Step 5: Check EAS logs for dispatch errors.success(JOB_MARKET_FEED, 00.30) means the job only satisfies the condition if it succeeded within the last 30 minutes. Without a lookback, a job that succeeded hours or even days ago still satisfies the condition. Lookbacks are essential for time-sensitive pipelines — market data feeds, regulatory cutoffs, real-time settlement — where stale data from a previous run completing the condition would be dangerous.$$VARNAME syntax in commands or value(VARNAME) in conditions. They're set via sendevent -E SET_GLOBAL or defined in JIL with insert_global. The biggest production risk is persistence — global variables retain their value across box runs. If last night's run set GVAR_PAYMENT_FILE to a specific path and tonight's file watcher fails silently, the variable still holds yesterday's path and downstream jobs will process stale data. Always reset critical variables at the start of each pipeline run.machine attribute references the alias registered in the AutoSys topology database, not the server's actual hostname or IP address. These can be identical, but they don't have to be. You can verify registered machine aliases with autorep -M % -r. Using the wrong value — the actual hostname when the topology alias is different — leaves the job stuck in STARTING indefinitely because the Event Server cannot resolve the target agent. Always confirm the topology alias before defining a new job for a new machine.alarm_if_fail triggers an alarm when a job exits with a non-zero exit code (FAILURE status). alarm_if_terminated triggers an alarm when a job is killed via KILLJOB or by max_run_alarm expiry (TERMINATED status). In production, set both on critical jobs — a job that hangs and gets killed by max_run_alarm will not trigger alarm_if_fail because its status is TERMINATED, not FAILURE. Without alarm_if_terminated, a silently hung-and-killed job can go unnoticed.permission attribute controls who can execute (x) and edit (e) a job. The prefixes are g (group), w (world), and m (me/owner). wx means any user in the system can force-start the job; we means any user can modify the JIL definition. In enterprise environments, wx and we are serious security risks — any developer or operator account could accidentally or maliciously modify or trigger a production settlement job. Best practice is gx,ge only, mapping the group to a controlled AD/LDAP group via EEM.job_type: WS), which is native in AutoSys r12.x+. Define web_svc_url, web_svc_method (GET/POST), web_svc_body for the request payload, and web_svc_success_codes for the HTTP codes that constitute success (typically 200,201,202). The job succeeds or fails based on the response code — no wrapper script needed. Set max_run_alarm to handle hung connections. Global variables can be interpolated into the URL or body using $$GVAR_NAME syntax.run_calendar: BUSINESS_DAYS is used and the last day of the month falls on Saturday, the box has no trigger date for that run. Other causes: the calendar was updated and the change wasn't applied to all environments; the box has a days_of_week restriction that conflicts with the calendar; or the Event Server was down during the scheduled window and the missed run wasn't caught up. Diagnose with autorep -c CALENDAR_NAME -t to see all scheduled dates.std_out_file captures the script's stdout on the remote agent machine — whatever the shell script prints to standard output. The job log in the AutoSys database captures the job's lifecycle events: when it was dispatched, which agent received it, the exit code, start and end timestamps. Both are essential for debugging: the job log tells you what AutoSys did, the std_out_file tells you what the script did. If std_out_file is not set, stdout is lost when the job completes. Use $DATE in the filename to preserve one log per run rather than overwriting.autorep -J JOB_NAME -q — the -q flag outputs the job definition in JIL format that can be piped directly to the jil command on another instance. This is the standard way to copy jobs between environments (dev → test → prod), create backups before making changes, or document existing job definitions. For a full box including all children: autorep -J BOX_NAME -q exports the box and every job inside it.autorep -G GVAR_NAME to see the current value of the variable. If it's empty or set to an unexpected value, that's why the condition fails. Also run autorep -J JOB_NAME -d to inspect the job's condition attribute and confirm exactly what value the condition is checking. Common scenario: a job has condition: value(GVAR_RUN_DATE) != "" but the variable was never set because an earlier job that calls SET_GLOBAL failed. Fix by manually setting: sendevent -E SET_GLOBAL -G GVAR_RUN_DATE -V "20260417", then force-starting the blocked job.min_file_size sets the minimum file size in bytes that the watched file must reach before the FW job considers it a success. Setting it to 1 prevents the job from triggering on an empty file — a common failure mode in file-based pipelines where the upstream system creates the file immediately but writes data to it over time. Without min_file_size, the FW job succeeds the instant the file appears (even with 0 bytes), the downstream CMD job starts, and attempts to process an empty file. This causes subtle failures that are hard to diagnose because the file exists but contains no data.-D flag with sendevent: sendevent -E FORCE_STARTJOB -J BOX_NAME -D YYYYMMDD. This triggers the box as if it were running on the specified date, which is critical for date-aware jobs that use $$DATE or date-based global variables. Without the -D flag, FORCE_STARTJOB runs the box with today's date, which would cause the jobs to process the wrong data set. Always confirm the date format matches your AutoSys configuration — some environments use MMDDYYYY.permission: gx,ge) are per-job access controls defined in the job definition itself — they control who can execute or edit that specific job based on OS group membership. EEM (Embedded Entitlements Manager) is the centralized RBAC layer that maps roles (Operator, Developer, Admin) to LDAP/AD groups across all jobs in the instance. EEM supersedes JIL permissions in modern AutoSys deployments — it allows consistent access control without touching individual job definitions. Use EEM when you need enterprise-wide role enforcement; use JIL permissions as a secondary layer for job-specific restrictions.
20 years in enterprise IT, the last decade working with AutoSys deployments in banking, insurance, and fintech environments — the kind of shops running 800-job nightly batch windows where a single misconfigured condition: attribute at midnight becomes a 3am incident call. The production incidents, gotchas, and debugging patterns in this guide are drawn from those environments, not from documentation.
I built TheCodeForge because I was tired of documentation that explains what to type without explaining why it works — and what breaks when it doesn't.