AutoSys FORCE_STARTJOB — Condition Bypass Corrupts Data
FORCE_STARTJOB bypassed validate_ledger and extract_transactions conditions.
20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.
- FORCE_STARTJOB: runs job immediately, bypasses ALL conditions (time + dependencies). Use when you need output now, but know what you're skipping.
- KILLJOB: terminates running job → TERMINATED status. Downstream success() conditions won't fire. Use CHANGE_STATUS SUCCESS afterward to unblock.
- RESTART: retry a FAILED or TERMINATED job. Cleaner than FORCE_STARTJOB for retries — audit logs show intent.
- STARTJOB: respects conditions. Job starts only if its time + dependency gates are open. Practically useless in emergencies.
- Production rule: FORCE_STARTJOB without understanding dependencies corrupts data. KILLJOB without CHANGE_STATUS blocks workflows for hours.
Force starting a job is like overriding the traffic light and going anyway. Killing a job is like hitting the emergency stop button. Restarting is like pressing the retry button after a failure. These are your emergency controls for when the normal flow needs intervention.
Production AutoSys environments need manual intervention. Jobs hang. Downstream dependencies get stuck. A fix deploys and you need to rerun a failed job at 2 AM.
Knowing the exact sendevent command is table stakes. Knowing what happens after — that's the senior engineer difference.
FORCE_STARTJOB bypasses conditions. KILLJOB leaves downstream jobs waiting. RESTART is for retries, not first runs. This article covers the side effects that incident post-mortems reveal.
How AutoSys FORCE_STARTJOB Breaks Job Dependencies
FORCE_STARTJOB is an AutoSys command that starts a job immediately, bypassing all upstream conditions, box dependencies, and calendar rules. It ignores the job's defined starting conditions entirely — no waiting for predecessor success, no checking for box status, no respecting time windows. The job runs as if all conditions are met, regardless of reality.
When you issue sendevent -E FORCE_STARTJOB -J job_name, AutoSys sets the job's status to STARTING and launches it. The job's exit code still updates its status (SUCCESS/FAILURE), but the event log records the forced start. Downstream jobs that depend on this job will see the status change and may trigger, even though the data the job processed might be stale or incomplete because upstream dependencies were skipped.
Use FORCE_STARTJOB only when you have manually verified that all upstream data is ready and consistent — typically during disaster recovery or after manual data reconciliation. In production, teams often reach for it to unstick a stalled pipeline, but the bypass corrupts data integrity because downstream consumers see a success signal without the actual dependency chain being satisfied.
FORCE_STARTJOB — bypassing all conditions
FORCE_STARTJOB immediately starts a job regardless of its date_conditions, start_times, or condition dependencies. It's the 'run it now, no questions asked' command.
Critical nuance: FORCE_STARTJOB bypasses EVERYTHING. Not just the schedule. Not just the time gates. Also any condition: success(other_job) dependencies. The job runs even if its upstream dependencies never ran or failed.
This is the most dangerous sendevent command. Use it only when you fully understand what conditions exist on the job.
KILLJOB — terminating a running job
KILLJOB sends a termination signal to the process running on the agent machine. The job moves to TERMINATED status. Any downstream jobs waiting on success() of this job will not start.
Critical nuance: TERMINATED is NOT FAILURE. It's a separate status. A job that's killed doesn't trigger success() conditions, but it also doesn't trigger failure() conditions unless you explicitly check for TERMINATED.
After KILLJOB, the process receives SIGTERM. Well-behaved processes can catch this and clean up. Hung processes may need SIGKILL (AutoSys handles this escalation after a timeout, typically 30 seconds).
success() won't trigger.RESTART — retrying a failed job
The RESTART event tells AutoSys to rerun a job that is in FAILURE or TERMINATED status. It's cleaner than FORCE_STARTJOB for rerunning failed jobs because it signals intent as a retry in audit logs.
Key difference from FORCE_STARTJOB: RESTART works only on FAILURE or TERMINATED jobs. FORCE_STARTJOB works on any non-running state. RESTART also respects that this is a retry — some AutoSys configurations treat retries differently for alerting purposes.
RESTART does NOT bypass conditions. The job still needs its start conditions satisfied (unless they were the reason it failed — then you have a cycle).
CHANGE_STATUS — manual status override
CHANGE_STATUS is the nuclear option. It manually sets a job's status in the Event Server without running anything. This is how you unblock workflows when a job can't or shouldn't be rerun.
Most common use: A job was killed (KILLJOB) but the work was already done. No point rerunning. Set its status to SUCCESS to fire downstream success() conditions.
Risks: CHANGE_STATUS bypasses ALL validation. No agent communication. No process verification. You're telling AutoSys 'trust me, this job is in this state'. If you're wrong, downstream processes run on incorrect assumptions.
AutoSys Job Recovery: The Incident Report You Weren't Expecting
You force-started a job at 3 AM to unblock the batch pipeline. By 5 AM, five downstream jobs had silently failed because their box conditions never evaluated. The scheduler saw the job as a manual override and skipped all the dependency checks you'd spent weeks building. The production incident report won't mention your name, but your teammates will remember.
When you use FORCE_STARTJOB, AutoSys flags the job with a special attribute that tells the scheduler: "This run is a dirty override — don't trust its completion for dependency resolution." The job runs, exits zero, but its box conditions and success codes are ignored by downstream jobs. The system treats it like a ghost run. That's why intelligent operators build a recovery job — a wrapper that checks the job's exit status and manually updates the downstream conditions using CHANGE_STATUS after verifying the job actually did its work.
The recovery pattern is simple: after the force start completes, run a secondary process that validates the output (file landed, DB updated, API responded), then explicitly sets the condition in the downstream box. Your control room will thank you when the morning batch doesn't implode.
KILLJOB Deep Cut: Graceful vs. Slaughter Mode
Standard KILLJOB sends SIGTERM. That's the polite wave that says "please finish your transaction and exit." But when you've got a runaway Java process chewing through 16 GB of heap and ignoring polite requests, you need the nuclear option. AutoSys gives you that with KILLJOB /SIGKILL = 1 — but the documentation hides the consequences.
When you send SIGKILL, AutoSys immediately marks the job as KILLED in the scheduler. The process might still be alive for a few seconds before the OS terminates it. But here's the trap: if your job had open file handles (database connections, temporary files, NFS mounts), those get orphaned. The DB connection pool will leak. Temp files stay behind and fill up disk. NFS mounts stay locked.
Senior engineers build a two-stage kill procedure into their runbooks: first KILLJOB (SIGTERM), wait 30 seconds, check autorep -j JOBNAME -w 10 for the KILLED status. If it's still RUNNING, then fire KILLJOB /SIGKILL = 1. After that, run a cleanup script that checks for orphaned temp files by PID, kills any residual processes by PGID, and logs the incident with a timestamp for your SRE dashboard. Do this right once, and the 2 AM page becomes a 5-minute fix instead of a 3-hour war room.
autorep -j JOBNAME -w 10 to poll job status in your kill scripts. It returns the current status string — grep for RUNNING. Faster than parsing XML.The FORCE_STARTJOB That Corrupted the Ledger
condition: success(validate_ledger) AND success(extract_transactions). The validate_ledger job had failed earlier. The on-call engineer saw the reporting job in INACTIVE status and used FORCE_STARTJOB.
FORCE_STARTJOB bypassed BOTH conditions. The job ran without validated ledger data. The output was based on incomplete extracts. No alarm fired because the job succeeded.
The team didn't know about the condition dependency because they only looked at autorep -q, which shows conditions but not in an obvious way.autorep -J jobname -q | grep condition first.
2. For failed dependencies, fix and restart the dependency chain, not the leaf job.
3. Add box_terminator on validation jobs so the box stops before leaf jobs can be force-started externally.
4. Create an audit script that logs all FORCE_STARTJOB events and flags any that bypassed conditions.- FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.
- Always check
autorep -q | grep conditionbefore force-starting. - Fixing a leaf job without fixing its dependencies propagates corruption.
- Force-start is for schedule overrides, not dependency bypasses.
success(). If yes, use CHANGE_STATUS -s SUCCESS after kill. Verify with autorep -d.success() conditions without rerunning the job. Verify autorep shows SUCCESS before downstream runs.autorep -J JOBNAME -q | grep -E 'condition|date_conditions|start_times'sendevent -E FORCE_STARTJOB -J JOBNAMEKey takeaways
success() won't fire. Use CHANGE_STATUS to unblock if work was done.Common mistakes to avoid
5 patternsForce-starting a job without checking conditions first
autorep -J JOB -q | grep condition before FORCE_STARTJOB. If conditions exist, fix the dependency chain instead. Use RESTART on failed dependencies, then let the original job start naturally.Killing a job but not unblocking downstream dependencies
success(). Team manually reruns downstream jobs but missing the killed job's work.sendevent -E LIST_DEPENDENTS -J killed_job. If downstream jobs are waiting, either:
- RESTART the killed job (if work wasn't done)
- CHANGE_STATUS to SUCCESS (if work was done elsewhere)Using FORCE_STARTJOB when RESTART would be appropriate
Assuming CHANGE_STATUS triggers dependency re-evaluation
Using FORCE_STARTJOB on a child job inside a non-running BOX
Interview Questions on This Topic
What is the difference between FORCE_STARTJOB and STARTJOB?
Frequently Asked Questions
JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.
20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.
That's AutoSys. Mark it forged?
4 min read · try the examples if you haven't