AutoSys success() — INACTIVE Jobs Block Chains Silently
success(job_A) stays false indefinitely when job_A is INACTIVE, not failed.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
- AutoSys conditions define when a job can start based on other jobs' statuses
- Four functions: success(), failure(), done(), notrunning()
- Combine with AND / OR and parentheses for complex rules
- Conditions evaluate after the referenced job finishes (or stops running for notrunning)
- Common failure: using success() for cleanup jobs — use done() instead
- In production, complex conditions with many AND/OR masks timing issues — test with autorep first
AutoSys conditions are like traffic lights for your jobs. Job B has a red light until Job A turns it green by succeeding. You can set up complex rules: Job C runs if Job A OR Job B succeeds. Job D only runs if ALL of A, B, and C succeed.
Dependency management is AutoSys's superpower — the thing that justifies its cost over cron. The condition attribute lets you express exactly what must be true before a job can start. You can chain hundreds of jobs with precise dependency rules, and AutoSys handles the orchestration automatically.
This article covers all condition types, how to combine them, and the gotchas that cause dependency chains to break in production.
Why AutoSys Job Dependencies Are Not Just Conditions
AutoSys job dependencies define the execution order of jobs in a batch processing pipeline. The core mechanic is the condition string — a boolean expression referencing upstream job statuses (success(), failure(), terminated(), etc.). When all conditions evaluate to true, the downstream job becomes eligible to start. This is the fundamental scheduling primitive in AutoSys.
A critical property: conditions are evaluated against the most recent run of each referenced job. If an upstream job has never run, its status is INACTIVE, and success() evaluates to false — the downstream job blocks indefinitely. This is not a timeout; it's a silent deadlock. The scheduler does not emit an alert; the job simply never starts.
Use dependencies when you need strict sequential processing — ETL pipelines, data validation chains, or any workflow where job B must only run after job A completes successfully. In practice, this means every upstream job must have at least one completed run in the job's history before the dependency can ever trigger. Teams new to AutoSys often miss this and wonder why their pipeline never kicks off.
success() returns false. Your downstream job will wait forever with no alert unless you add a timeout or a start condition.success().The four condition functions
AutoSys provides four functions for expressing job conditions:
- success(job): Triggers when the referenced job completes with exit code 0
- failure(job): Triggers when the referenced job completes with non-zero exit code
- done(job): Triggers when the referenced job completes, regardless of exit code
- notrunning(job): Triggers when the referenced job is not currently in RUNNING state
Each function takes the name of another job (or box) as argument. The condition is evaluated when the referenced job changes state. For , success(), and failure() the state change is completion/termination. For done() it's any transition out of RUNNING, including job start (from not-running to running also triggers, but that's not typical usage).notrunning()
success() for cleanup jobs.done() for cleanup, notification, or logging jobs.Cross-box dependencies
Conditions can reference jobs in other boxes or standalone jobs. This lets you build workflows that span multiple boxes.
When a condition references a job outside the current box, AutoSys uses global job name resolution. The referenced job must have a unique name across the entire AutoSys instance — if duplicates exist, the condition may resolve to the wrong job.
You cannot condition on a box name to wait for all children of that box. Instead, condition on the last child job of that box, or use done(box_name) which completes only when all children of that box have finished.
done() on the box job itself, which completes when all its children are done.The done() condition and why it matters
done() is often overlooked but very useful. It lets a job run after another job completes regardless of whether that job succeeded or failed. This is perfect for cleanup or notification jobs that should always run.
- Cleanup temp files after a job completes
- Send status email with success/failure info
- Update a monitoring system with job execution results
Because done() triggers after any termination — including kill or timeout — it ensures your post-processing runs reliably.
success() and failure() on separate branches.success() or failure()done() — cleanup, notification, monitoringnotrunning() with caution, or force-run the upstream job.notrunning(): preventing concurrent execution
The condition triggers when the referenced job is not in RUNNING state. This is useful for throttling or serialising jobs that share a resource.notrunning()
Important caveat: evaluates true if the referenced job has never run (INACTIVE status). That means if you write notrunning()condition: notrunning(app_job) and app_job hasn't been scheduled yet, the condition is immediately true — the dependent job will start right away, likely not what you intended.
To prevent concurrent runs of the same job, use condition: notrunning(app_job) where app_job is the same job. AutoSys supports self-referencing conditions for this purpose. When used on the same job, the condition is met only after the previous instance finishes.
notrunning() returns true. This can cause dependent jobs to start unexpectedly. Always verify the referenced job's schedule and status.notrunning() is the standard pattern for job serialization.notrunning() with a time condition (start_times) or ensure the referenced job has run at least once.success() or done().Complex conditions with AND, OR, and parentheses
You can build complex dependency logic by combining conditions with AND and OR operators, and using parentheses to control evaluation order. AutoSys evaluates conditions from left to right, respecting parentheses. There's no NOT operator.
Example: You need a job to run if either Job A and Job B both succeeded, OR Job C succeeded. The condition would be:
condition: (success(job_a) AND success(job_b)) OR success(job_c)
Without parentheses, AND has higher precedence than OR, so: condition: success(job_a) AND success(job_b) OR success(job_c) is equivalent to: condition: (success(job_a) AND success(job_b)) OR success(job_c)
However, it's safer to always use explicit parentheses for readability and to avoid future misinterpretation.
Complex conditions are powerful but can lead to hard-to-debug behaviours. If you have many conditions, consider breaking the logic into intermediate box jobs to simplify each condition.
Why Your Dependency Tree Will Break (and How to Fail Fast)
You've wired ten jobs in a cascade with AND conditions, expecting a nice serial execution. Then nothing runs. You check the database — the predecessor ran fine three hours ago.
Here's what happened: AutoSys checks conditions at the moment the job is scheduled to start. If a predecessor finished at 10:00 and your dependent job starts at 12:00, a simple done(jobA) condition should work. But if the predecessor ran a different box or was manually restarted, its exit status may have been overwritten or never recorded.
The fix: always pin your dependency on a specific run number or instance. Use done(jobA.1) to refer to the first execution of that day. Even better — combine with to catch failed runs before your job even considers starting.exitcode()
Fail fast means your job alerts you within its first minute of the window, not after you've waited an hour. Store predecessor exit codes in a global variable and check them in a pre-processing job. That single step has saved my team from three post-mortems this year.
done(jobA) in a chain of more than two jobs. AutoSys has a race condition where the condition check can miss a job that finishes too quickly. Always pin to .1 or a specific box instance.jobA.1) and validate exit codes before your job starts. Assume AutoSys will miss one out of every hundred conditions — build your alerting around that.The sendevent Trap: When Your Manual Override Destroys Your Pipeline
Something is stuck. You run sendevent -E FORCE_STARTJOB job_payroll_merge because the condition hasn't been met and payroll is due. It works. The job runs. You go home.
Next morning: seventeen downstream jobs failed. The FORCE_STARTJOB event set the job's status to SUCCESS but did not update the condition history. Downstream jobs checked for done(job_payroll_merge.1) — AutoSys had no record of that run instance because it was forced, not naturally triggered.
Use sendevent -E CHANGE_STATUS -s SUCCESS instead. This updates the job's status in the database and records the completion time, so conditions evaluate correctly. But even that is dangerous — it bypasses any exit code logic.
Better approach: write a small wrapper script that checks if the condition would have been met naturally, then only then forces the start. Log every forced start to a dedicated file with a timestamp and operator ID. You want a paper trail for every time you override the system.
I've seen this take down a quarterly close pipeline because someone forced a job that skipped a data validation step. The data was corrupt for three weeks.
alias fstart='sendevent -E CHANGE_STATUS -s SUCCESS -j'. It's one command and won't break downstream conditions. I keep this in my .bashrc on every AutoSys server.Condition Settings for Various Outcomes
AutoSys conditions are not limited to job success or failure. You can trigger jobs based on termination signals, exit codes, or even the absence of a job run. The condition catches jobs killed by term()sendevent -e FORCE_START or OS signals like SIGKILL. This prevents your pipeline from hanging indefinitely when a job crashes abnormally. For exit-code precision, use exitcode(job_name, N) to trigger on specific return values. Combined with failure conditions, you can route jobs to remediation scripts only when critical errors occur. A common mistake is assuming done(job_name) covers all outcomes—it does not. If your downstream job must run only when the upstream exits with code 42, you must write exitcode(JOB_A, 42) rather than a generic success condition. Failed jobs (exit code != 0) require failure(JOB_A). This granularity saves debugging time and prevents cascading failures in multi-step workflows.
done() when you need failure() masks real errors. Your pipeline will quietly skip recovery jobs, leading to data corruption that surfaces hours later.Template Parameters in Conditions
AutoSys job definitions often use templates with variables like %%JOB_NAME or %%BOX_RUN_ID. Conditions in template-based jobs must reference these parameters carefully because the condition string is parsed before variable substitution. If you write condition: done(%%PARENT_JOB), AutoSys will interpret the literal text '%%PARENT_JOB' at parse time. Instead, define conditions using static names and apply the template to concrete instances after substitution. For parameterized dependencies across multiple environments, use global_vars within the condition, referencing variables set by upstream jobs. Example: condition: status(%%DB_HOST_PING, SUCCESS) fails because '%%DB_HOST_PING' is unknown. The fix is to pass host-specific conditions via job overrides or use a lookup table with global_vars. Another pattern: write templates that accept conditions as parameters, then supply the actual job names at instantiation. This keeps your DRY approach intact while ensuring conditions resolve correctly at runtime.
The Silent Deadlock: A job never ran because its predecessor was INACTIVE
notrunning() to allow job_B to run if job_A was never started.- success() and
failure()require the referenced job to have run at least once and terminated. INACTIVE jobs never produce a true evaluation. - For jobs that must always run after a predecessor, even if the predecessor is skipped, use
done()ornotrunning()with an additional calendar check. - Always verify referenced jobs are scheduled and not on hold. Use autorep -q job_A to check status before the batch window.
success() instead of done(). Cleanup jobs should always use done() so they fire regardless of upstream exit code.notrunning() evaluates to true as soon as the referenced job is not running, even if it hasn't started yet. Use success() or done() for sequential execution, not notrunning().autorep -q <jobname> | grep -i conditionautorep -q <referenced_job> | grep -i statusdone(), or force the referenced job to run with sendevent -E FORCE_START -J <refjob>.Key takeaways
failure(), done(), and notrunning()Common mistakes to avoid
5 patternsUsing success() instead of done() for cleanup jobs
Not accounting for jobs in INACTIVE state
done() if the dependency is not strict. Use autorep -q job_A to verify status before batch.Creating circular dependencies
Misunderstanding OR conditions leading to early triggering
Assuming notrunning() waits for a job to run
success() or done() for sequential dependencies. Only use notrunning() when you specifically want to check runtime state.Interview Questions on This Topic
What is the difference between success() and done() conditions in AutoSys?
done() triggers when the reference job completes regardless of exit code. So done() runs even if the job failed, while success() only runs on success. Use success() for normal chaining and done() for cleanup or notification jobs that must always execute.Frequently Asked Questions
JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
That's AutoSys. Mark it forged?
7 min read · try the examples if you haven't