AutoSys Box Jobs: The 1 Container That Controls Your Batch Pipeline
- BOX jobs are containers — they don't execute commands, they control when child jobs can start.
- A box enters SUCCESS only when ALL inner jobs have succeeded; it enters FAILURE if any inner job fails.
- You can nest boxes inside boxes to build hierarchical batch workflows.
- BOX job = container for grouping child jobs — runs no command itself, controls when children can start (box must be RUNNING first)
- Key components: box_name (child attribute linking to parent), condition (dependency between child jobs), nested boxes (boxes inside boxes)
- Performance: Box status updates are O(number_of_children) — deep nesting (>5 levels) can slow Event Processor
- Production trap: Child job has date_conditions: 1 with start_times inside box — waits for box AND specific time, may never run if box starts after that time
- Biggest mistake: Box shows RUNNING but children not starting — child has unmet condition, ON_HOLD, or machine offline; not an error, just unmet prerequisite
AutoSys Box Debug Cheat Sheet
Box RUNNING, children INACTIVE
autorep -J child_name -d | grep -E 'start_times|condition'autostatus -J child_nameBox never starts — INACTIVE after scheduled time
autorep -J box_name -d | grep -E 'start_times|condition'autorep -J box_name -sBox stuck RUNNING — never completes
autorep -J box_name% -s RUautorep -J box_name% -d | grep 'Status: RU' -A5Box FAILED but need it to continue despite child failure
autorep -J failing_child -d | grep ignore_failureecho 'ignore_failure: 1' can be set on child jobNested box — parent box RUNNING but child box INACTIVE
autorep -J child_box -d | grep -E 'condition|start_times'autorep -J child_box -sProduction Incident
condition dependencies. They didn't know that a child job with date_conditions: 1 and start_times: "22:00" would wait until 10pm regardless of the box's state. They also didn't know how to check a child job's attributes with autorep -J child -d.date_conditions: 1 and start_times: "22:00". The ETL box started at 18:00. The child job was waiting for its own start time (22:00) to be reached, not just the box's RUNNING status. The box's start condition and the child's start condition were both required. The operator never checked the child's JIL definition. Five hours later at 22:00, the child jobs started and succeeded, but the operator had already wasted 2 hours of troubleshooting time.date_conditions: 1 and start_times from all child jobs inside the box. The box's schedule should control overall timing.
2. For jobs that genuinely need to start at a specific time within a larger window, documented the behaviour in team runbook.
3. Added a pre-flight check script: autorep -J child -d | grep -E 'start_times|date_conditions' to flag child jobs with their own schedules.
4. Added a monitoring alert: if box RUNNING > 10 minutes and no children STARTING, check child job conditions and start_times.autorep -J job -d when they don't start. Look for date_conditions: 1 and start_times.Generally, child jobs inside a box should not have their own start_times. Let the box control timing.When a box is RUNNING but children are INACTIVE, the problem is not the box — it's the children's prerequisites.Production Debug GuideSymptom → Action mapping for common box failures in production AutoSys environments.
autorep -J child -d | grep start_times. Also check box condition: child may have condition: success(other_job) that not met yet. Check child ON_HOLD/ON_ICE status: autostatus -J child.autorep -J % -d inside box to find which child failed first. Use sendevent -E CHANGE_STATUS -J box -s FAILURE to mark box as failed? Actually, box fails automatically when child fails. To keep box running despite child failure, use ignore_failure: 1 on the child (use sparingly).autorep -J child_box -d. If child box SUCCESS but inner jobs FAIL, your condition logic may be wrong; the child box may have condition: success(inner_job) that is not evaluated because failed inner job didn't run.date_conditions: 1 and start_times correctly set. Check if box has condition: success(other_box) that is not met. Check if box is ON_HOLD/ON_ICE. Check if run calendar is active: autorep -C calendar_name.autorep -J % -s RU for jobs inside box. The stuck job may have hung (infinite loop, waiting for I/O, deadlock). Use term_run_time to kill hung jobs automatically.BOX jobs are central to well-organised AutoSys environments. Without boxes, you'd have hundreds of independent jobs with no logical grouping, no shared scheduling, and no way to see the end-to-end status of a business process at a glance. With boxes, you can group related jobs, control their collective schedule, and see in seconds whether your end-of-day run succeeded.
But boxes are dangerous when misused. A child job with its own start_times inside a box may never run because the box starts after that time window. A box shows RUNNING but no children start — operators assume the jobs are broken, but the condition is unmet. And a Super Box (top-level box) can mask failures: if a child box fails, the parent box fails, but you lose visibility of which specific job caused the failure.
By the end you'll know exactly how boxes control child jobs, when boxes succeed or fail, how to nest boxes, and the specific debug steps when inner jobs won't start.
How BOX jobs control child job execution
The box controls three things for its child jobs: 1. When they can start: Child jobs can only start when the box is in RUNNING state 2. The execution environment: All child jobs inherit the box's scheduling context 3. Collective status: The box reports SUCCESS only when ALL child jobs succeed
A child job's own conditions (start_times, conditions) still apply within the box. If Job B has condition: success(Job A), it still waits for Job A even though the box is running.
/* Box starts at 10 PM on weeknights */ insert_job: eod_box job_type: BOX date_conditions: 1 days_of_week: mon-fri start_times: "22:00" alarm_if_fail: 1 /* Job A: starts as soon as box is RUNNING */ insert_job: job_a job_type: CMD box_name: eod_box command: /scripts/step_a.sh machine: server01 owner: batch /* Job B: waits for Job A, but still inside the box */ insert_job: job_b job_type: CMD box_name: eod_box command: /scripts/step_b.sh machine: server01 owner: batch condition: success(job_a)
Nested boxes — boxes inside boxes
You can place a BOX job inside another BOX job. This creates a hierarchy that lets you organise complex batch flows into logical sub-processes.
In a nested setup, the parent box must be RUNNING before the child box can start. The child box must be RUNNING before its own child jobs can start. The parent box succeeds only when ALL child boxes (and their contents) have succeeded.
/* Parent box — the overall EOD run */ insert_job: master_eod_box job_type: BOX date_conditions: 1 days_of_week: mon-fri start_times: "21:00" /* Child box 1 — ETL processing */ insert_job: etl_box job_type: BOX box_name: master_eod_box /* inside master box */ /* Child box 2 — reporting (runs after ETL) */ insert_job: reporting_box job_type: BOX box_name: master_eod_box condition: success(etl_box) /* waits for ETL box to complete */ /* Jobs inside etl_box */ insert_job: etl_extract job_type: CMD box_name: etl_box command: /scripts/extract.sh machine: etl01 owner: batch
autorep -J master_box% -s FA to find failed jobs and traverse down.Debugging: box is running but inner jobs aren't starting
This is one of the most common issues you'll encounter. The box is in RUNNING state, but the jobs inside it stay INACTIVE or never start.
Most common causes: 1. The child job's own starting conditions aren't met (check condition attribute) 2. The child job has date_conditions: 1 with a start_times set — it's waiting for that specific time even though the box is running 3. The child job is ON_HOLD or ON_ICE 4. The child job's machine is offline (PEND_MACH) 5. The box_name attribute on the child job has a typo
# Check status of box and all inner jobs autorep -J eod_box -s # Check a specific inner job's attributes autorep -J job_b -d # Check if an inner job is on hold or on ice autostatus -J job_b # Check if the machine for an inner job is active autorep -M server01
date_conditions: 1 and start_times that are later than the box start time. The box runs, but the child waits for the clock.condition: success(other_job) where other_job is not in the box and never runs.autorep -J job -d to see child's full attributes. Look for date_conditions: 1 and start_times.| BOX Job State | What It Means | Can Inner Jobs Run? | How Box Enters This State |
|---|---|---|---|
| RUNNING | Box is active, scheduling conditions are met | Yes — if their own conditions are also met | Start time reached, dependencies satisfied |
| SUCCESS | All inner jobs and child boxes succeeded | No — box has completed | All children SUCCESS |
| FAILURE | At least one inner job or child box failed | No — remaining jobs don't start | Any child FAILURE |
| INACTIVE | Box hasn't been triggered yet (default state) | No | Initial state; never started |
| ON_HOLD | Box manually placed on hold | No — box won't start even if conditions met | Manual: ON_HOLD |
| ON_ICE | Box suspended (like ON_HOLD but more aggressive) | No — box won't start | Manual: ON_ICE |
| TERMINATED | Box was killed while running (manual or term_run_time) | No — child jobs also terminated | Manual KILLJOB or term_run_time exceeded |
🎯 Key Takeaways
- BOX jobs are containers — they don't execute commands, they control when child jobs can start.
- A box enters SUCCESS only when ALL inner jobs have succeeded; it enters FAILURE if any inner job fails.
- You can nest boxes inside boxes to build hierarchical batch workflows.
- Child jobs inside a box should generally not have their own start_times — let the box control timing.
- When a box is RUNNING but children are INACTIVE, the problem is the children's conditions, not the box.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is a BOX job in AutoSys and what does it do?JuniorReveal
- QWhen is a BOX job in SUCCESS state?JuniorReveal
- QIf a BOX job is RUNNING but its child jobs aren't starting, what would you check?Mid-levelReveal
- QWhat is a Super Box in AutoSys?Mid-levelReveal
- QCan a BOX job contain another BOX job?JuniorReveal
Frequently Asked Questions
What is a BOX job in AutoSys?
A BOX job is a container for other jobs. It doesn't execute any command itself — it groups jobs together and controls when they can run. Child jobs inside a box can only start when the box is in RUNNING state.
When does a BOX job move to SUCCESS?
A BOX job moves to SUCCESS only when ALL of its inner jobs have completed successfully. If any inner job fails, the box moves to FAILURE and remaining un-started inner jobs will not run.
My BOX job is RUNNING but inner jobs aren't starting — why?
Common causes: the inner job has date_conditions: 1 and is waiting for a specific start_times; the inner job has a condition that isn't met yet; the inner job is ON_HOLD or ON_ICE; the inner job's machine is offline; or the box_name attribute has a typo.
What is a Super Box in AutoSys?
Super Box is informal terminology for the highest-level BOX job that contains all other boxes and jobs for a given batch workflow. It provides a single point of control and visibility for the entire process.
Should child jobs inside a box have their own start_times?
Generally no. When a job is inside a box, the box controls the overall timing. Setting start_times on a child job means it will wait until that specific time even if the box is already running, which often causes confusion. Child jobs inside boxes typically rely on conditions (success of previous jobs) rather than clock times.
What's the difference between ON_HOLD and ON_ICE for a box?
Both prevent the box from starting. ON_HOLD is a temporary hold — the box can be released with sendevent -E RELEASE. ON_ICE is more permanent — the box is ignored as if it doesn't exist; dependencies that reference this box treat it as not satisfied. Use ON_ICE for boxes you want to disable entirely. Use ON_HOLD for pausing while keeping dependencies active.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.