AutoSys Box Jobs — Child date_conditions Override Timing
Box RUNNING for 4 hours while children stayed INACTIVE.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
- BOX job = container for grouping child jobs — runs no command itself, controls when children can start (box must be RUNNING first)
- Key components: box_name (child attribute linking to parent), condition (dependency between child jobs), nested boxes (boxes inside boxes)
- Performance: Box status updates are O(number_of_children) — deep nesting (>5 levels) can slow Event Processor
- Production trap: Child job has date_conditions: 1 with start_times inside box — waits for box AND specific time, may never run if box starts after that time
- Biggest mistake: Box shows RUNNING but children not starting — child has unmet condition, ON_HOLD, or machine offline; not an error, just unmet prerequisite
A BOX job in AutoSys is like a project folder on your computer. The folder itself doesn't do any work — it just contains files (child jobs). When you open the folder (start the box), the files inside become available. When all files are complete (all inner jobs succeed), the folder is done. You can even put folders inside folders.
BOX jobs are central to well-organised AutoSys environments. Without boxes, you'd have hundreds of independent jobs with no logical grouping, no shared scheduling, and no way to see the end-to-end status of a business process at a glance. With boxes, you can group related jobs, control their collective schedule, and see in seconds whether your end-of-day run succeeded.
But boxes are dangerous when misused. A child job with its own start_times inside a box may never run because the box starts after that time window. A box shows RUNNING but no children start — operators assume the jobs are broken, but the condition is unmet. And a Super Box (top-level box) can mask failures: if a child box fails, the parent box fails, but you lose visibility of which specific job caused the failure.
By the end you'll know exactly how boxes control child jobs, when boxes succeed or fail, how to nest boxes, and the specific debug steps when inner jobs won't start.
How Nested Box Jobs Actually Control Timing
A nested box job is a box job that contains other box jobs as children. The core mechanic: a parent box's date_conditions override the timing of all child boxes and their jobs, regardless of each child's individual date_conditions setting. This means if the parent box has date_conditions = y, the entire subtree waits for the parent's start time before any child can run — even if a child box has date_conditions = n. The override is absolute, not advisory.
In practice, this creates a strict hierarchical timing model. When a parent box starts, all its children become eligible to run based on their own conditions (job dependencies, box term conditions), but the parent's start time gates the entire tree. If a child box also has date_conditions = y, its own start time is ignored until the parent fires. This is not a bug — it's by design. AutoSys evaluates date_conditions top-down: parent wins, period.
Use nested boxes when you need a coordinated batch window across multiple job groups — for example, an overnight processing window that must not start before 2:00 AM. The parent box enforces the window; child boxes handle intra-group dependencies. Without this override, each child box could start independently, breaking the batch boundary. In production, this is the difference between a controlled pipeline and jobs bleeding into the wrong shift.
How BOX jobs control child job execution
The box controls three things for its child jobs: 1. When they can start: Child jobs can only start when the box is in RUNNING state 2. The execution environment: All child jobs inherit the box's scheduling context 3. Collective status: The box reports SUCCESS only when ALL child jobs succeed
A child job's own conditions (start_times, conditions) still apply within the box. If Job B has condition: success(Job A), it still waits for Job A even though the box is running.
Nested boxes — boxes inside boxes
You can place a BOX job inside another BOX job. This creates a hierarchy that lets you organise complex batch flows into logical sub-processes.
In a nested setup, the parent box must be RUNNING before the child box can start. The child box must be RUNNING before its own child jobs can start. The parent box succeeds only when ALL child boxes (and their contents) have succeeded.
autorep -J master_box% -s FA to find failed jobs and traverse down.Debugging: box is running but inner jobs aren't starting
This is one of the most common issues you'll encounter. The box is in RUNNING state, but the jobs inside it stay INACTIVE or never start.
Most common causes: 1. The child job's own starting conditions aren't met (check condition attribute) 2. The child job has date_conditions: 1 with a start_times set — it's waiting for that specific time even though the box is running 3. The child job is ON_HOLD or ON_ICE 4. The child job's machine is offline (PEND_MACH) 5. The box_name attribute on the child job has a typo
date_conditions: 1 and start_times that are later than the box start time. The box runs, but the child waits for the clock.condition: success(other_job) where other_job is not in the box and never runs.autorep -J job -d to see child's full attributes. Look for date_conditions: 1 and start_times.Why Your Box Job Status Lies to You
Every junior engineer has stared at a BOX job showing SUCCESS while the downstream pipeline blew up. Here's the dirty secret: a BOX job's status reflects its own exit code, not the state of its children. A box with zero jobs inside exits 0. A box with 50 failed jobs that all get 'ignore_exit_code' still exits 0. That status is a polite fiction.
What actually matters is the box's internal state machine. When a BOX starts, it evaluates child dependencies, spawns allowed jobs, and waits. It doesn't crash just because a child fails — unless the child's failure code propagates. The box only truly fails when it can't schedule or when a condition like 'max_run_alarm' fires.
You want truth? Stop reading the box's status. Read the parent job's log or query the box's children via autorep -j BOX_NAME -w. If the box shows RUNNING but kids are idle, you've got a dependency chain issue, not a failing box. The box status is LinkedIn profile — always polished. The children are the work logs.
s(BOX), you'll trigger on a lie. Always condition on the last real job inside the box.Nested Box Termination Rules That Will Burn You
You built a three-level deep nested box. The outer box crashes. You assume everything stops. Wrong. Autosys has three termination modes: 'force_children', 'terminate_only_box', and the default — nothing at all.
Without explicit box_termination, when the outer box fails, it just marks itself FAILED. Its children keep running like they're independent contractors who didn't get the memo. Your database gets charged twice because JOB_CHARGE_CARD kept executing inside a dead box.
Always set 'box_termination: force_children' on your outer boxes. This kills all running children when the box aborts. The tradeoff: you can't recover partial work. If that's a problem, use 'terminate_only_box' and handle cleanup in a downstream job.
Another gotcha: nested boxes inherit termination rules from their parent unless they explicitly override. If your inner box has 'box_termination: terminate_only_box' but the outer box forces termination, the outer wins. Test this in a lab. I've seen production databases corrupted because an inner box's termination setting got overridden by an outer box's default.
The Global Condition Trap in Nested Boxes
You think a condition like 's(BOX_A)' waits for BOX_A to complete? It waits for BOX_A to start. Autosys global conditions ('s', 'e', 'o') evaluate against the last known status of the referenced job. A box that started 3 days ago and is still RUNNING will satisfy 's(BOX_A)' immediately. That's not a bug — it's how the scheduler works.
Now nest that. You have a job inside a box with condition 's(BOX_OUTER)'. BOX_OUTER starts, condition met, job fires. But BOX_OUTER's children might not have run yet. You just triggered work outside the box's logical dependency chain. Welcome to race conditions.
Fix: never use global conditions to reference boxes you control. Use 'send'/'recv' events or condition on the specific child job. 's(JOB_VALIDATE_ORDER)' is concrete. 's(BOX_ORDER)' is guesswork.
If you absolutely must condition on a box completing, use 'e(BOX_NAME)' — which triggers on the box's end status. Even that can bite you: 'e' fires on any termination, including failure. Pair it with status filter: 'e(BOX_NAME, SUCCESS)'.
autorep -j JOB_NAME -w | grep Condition to see what fired it.The Box That Ran at Midnight but Inside It Stayed Dead
condition dependencies. They didn't know that a child job with date_conditions: 1 and start_times: "22:00" would wait until 10pm regardless of the box's state. They also didn't know how to check a child job's attributes with autorep -J child -d.date_conditions: 1 and start_times: "22:00". The ETL box started at 18:00. The child job was waiting for its own start time (22:00) to be reached, not just the box's RUNNING status. The box's start condition and the child's start condition were both required. The operator never checked the child's JIL definition. Five hours later at 22:00, the child jobs started and succeeded, but the operator had already wasted 2 hours of troubleshooting time.date_conditions: 1 and start_times from all child jobs inside the box. The box's schedule should control overall timing.
2. For jobs that genuinely need to start at a specific time within a larger window, documented the behaviour in team runbook.
3. Added a pre-flight check script: autorep -J child -d | grep -E 'start_times|date_conditions' to flag child jobs with their own schedules.
4. Added a monitoring alert: if box RUNNING > 10 minutes and no children STARTING, check child job conditions and start_times.- Child jobs inherit the box's RUNNING status, but they still have their own conditions (start_times, dependencies). Both must be satisfied.
- Always check child job attributes with
autorep -J job -dwhen they don't start. Look fordate_conditions: 1andstart_times. - Generally, child jobs inside a box should not have their own start_times. Let the box control timing.
- When a box is RUNNING but children are INACTIVE, the problem is not the box — it's the children's prerequisites.
autorep -J child -d | grep start_times. Also check box condition: child may have condition: success(other_job) that not met yet. Check child ON_HOLD/ON_ICE status: autostatus -J child.autorep -J % -d inside box to find which child failed first. Use sendevent -E CHANGE_STATUS -J box -s FAILURE to mark box as failed? Actually, box fails automatically when child fails. To keep box running despite child failure, use ignore_failure: 1 on the child (use sparingly).autorep -J child_box -d. If child box SUCCESS but inner jobs FAIL, your condition logic may be wrong; the child box may have condition: success(inner_job) that is not evaluated because failed inner job didn't run.date_conditions: 1 and start_times correctly set. Check if box has condition: success(other_box) that is not met. Check if box is ON_HOLD/ON_ICE. Check if run calendar is active: autorep -C calendar_name.autorep -J % -s RU for jobs inside box. The stuck job may have hung (infinite loop, waiting for I/O, deadlock). Use term_run_time to kill hung jobs automatically.autorep -J child_name -d | grep -E 'start_times|condition'autostatus -J child_nameKey takeaways
Common mistakes to avoid
5 patternsSetting date_conditions: 1 and start_times on a child job inside a box
Putting a machine attribute on a BOX job
Not setting a condition on the second-to-last job in a chain, causing jobs to run in parallel unexpectedly
condition: success(previous_job) to each job to enforce ordering. If parallel execution is desired, no condition needed, but be explicit in comments.Confusing box state with child job state
autorep -J box_name% to list all children with their statuses. Never assume children are RUNNING just because box is.Nesting boxes too deeply (>5 levels)
Interview Questions on This Topic
What is a BOX job in AutoSys and what does it do?
Frequently Asked Questions
JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
That's AutoSys. Mark it forged?
5 min read · try the examples if you haven't