Senior 3 min · March 19, 2026

AutoSys FORCE_STARTJOB — Condition Bypass Corrupts Data

FORCE_STARTJOB bypassed validate_ledger and extract_transactions conditions.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • FORCE_STARTJOB: runs job immediately, bypasses ALL conditions (time + dependencies). Use when you need output now, but know what you're skipping.
  • KILLJOB: terminates running job → TERMINATED status. Downstream success() conditions won't fire. Use CHANGE_STATUS SUCCESS afterward to unblock.
  • RESTART: retry a FAILED or TERMINATED job. Cleaner than FORCE_STARTJOB for retries — audit logs show intent.
  • STARTJOB: respects conditions. Job starts only if its time + dependency gates are open. Practically useless in emergencies.
  • Production rule: FORCE_STARTJOB without understanding dependencies corrupts data. KILLJOB without CHANGE_STATUS blocks workflows for hours.
Plain-English First

Force starting a job is like overriding the traffic light and going anyway. Killing a job is like hitting the emergency stop button. Restarting is like pressing the retry button after a failure. These are your emergency controls for when the normal flow needs intervention.

Production AutoSys environments need manual intervention. Jobs hang. Downstream dependencies get stuck. A fix deploys and you need to rerun a failed job at 2 AM.

Knowing the exact sendevent command is table stakes. Knowing what happens after — that's the senior engineer difference.

FORCE_STARTJOB bypasses conditions. KILLJOB leaves downstream jobs waiting. RESTART is for retries, not first runs. This article covers the side effects that incident post-mortems reveal.

FORCE_STARTJOB — bypassing all conditions

FORCE_STARTJOB immediately starts a job regardless of its date_conditions, start_times, or condition dependencies. It's the 'run it now, no questions asked' command.

Critical nuance: FORCE_STARTJOB bypasses EVERYTHING. Not just the schedule. Not just the time gates. Also any condition: success(other_job) dependencies. The job runs even if its upstream dependencies never ran or failed.

This is the most dangerous sendevent command. Use it only when you fully understand what conditions exist on the job.

force_start.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Force start a single job immediately
sendevent -E FORCE_STARTJOB -J daily_report

# Force start a BOX (starts the box, inner jobs still follow their conditions)
sendevent -E FORCE_STARTJOB -J eod_processing_box

# Force start on a specific date (run it as if it were that date)
sendevent -E FORCE_STARTJOB -J daily_report -q 20260319

# Check what conditions you're about to bypass — ALWAYS do this first
autorep -J daily_report -q | grep condition

# Check current status
autorep -J daily_report
Output
/* Before force-start: check conditions */
condition: success(extract_trades) AND success(validate_positions)
/* Event sent: FORCE_STARTJOB for daily_report */
/* Job daily_report: STARTING → RUNNING (02:00:01) */
/* Both dependencies were skipped entirely */
Bypassed conditions have consequences
FORCE_STARTJOB skips ALL conditions, including dependency conditions. If daily_report depends on extract_job completing first, force-starting it means it runs without the extract data. Make sure you understand what conditions you're bypassing.
Production Insight
A team force-started a billing job at 2 AM after a database timeout. The job ran, generated invoices, and sent them to customers. Three days later, the finance team noticed duplicate invoices. The billing job had a condition: success(settlement_validation). The validation job had failed at 1:55 AM. The force-start bypassed it completely.
The fix: Never force-start a job with dependencies. Fix the dependency chain instead. Use RESTART on the failed validation job, let it succeed, then the original job will start normally through conditions.
Diagnosis: autorep -J jobname -q | grep condition shows the dependencies. If non-empty, fix them first.
Rule: FORCE_STARTJOB is for schedule overrides, not dependency bypasses.
Key Takeaway
FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.
Always check conditions first: autorep -J JOB -q | grep condition.
Fix failed dependencies, don't skip them. Force-start propagates corruption.
For boxes, start the box, not the child.
Should you FORCE_STARTJOB or fix dependencies first?
IfJob has condition dependencies that are FAILED or never ran
UseDo NOT force-start. Fix the dependencies first (RESTART or rerun), let conditions trigger naturally.
IfJob has only time-based conditions (start_times, days_of_week)
UseFORCE_STARTJOB is safe — you're just overriding the schedule.
IfJob has no conditions at all
UseFORCE_STARTJOB works, but why isn't the job running? Check date_conditions and start_times first.
IfJob is inside a BOX that's not RUNNING
UseFORCE_STARTJOB the BOX, not the inner job. Force-starting an inner job without the box running causes inconsistent state.
Manual Job Control Flow Manual Job Control Flow. When and how to intervene · Job in FAILURE / stuck · conditions met but job not running · Read error log FIRST · cat std_err_file — never skip this · Fix the root cause THECODEFORGE.IOManual Job Control FlowWhen and how to intervene Job in FAILURE / stuckconditions met but job not running Read error log FIRSTcat std_err_file — never skip this Fix the root causecode fix, space cleared, DB restarted RESTART or FORCE_STARTJOBRESTART for failed, FORCE for bypass Monitor to SUCCESSwatch autorep -J jobname Verify actual outputcheck file/row count — not just statusTHECODEFORGE.IO
thecodeforge.io
Manual Job Control Flow
Autosys Force Start Kill Job

KILLJOB — terminating a running job

KILLJOB sends a termination signal to the process running on the agent machine. The job moves to TERMINATED status. Any downstream jobs waiting on success() of this job will not start.

Critical nuance: TERMINATED is NOT FAILURE. It's a separate status. A job that's killed doesn't trigger success() conditions, but it also doesn't trigger failure() conditions unless you explicitly check for TERMINATED.

After KILLJOB, the process receives SIGTERM. Well-behaved processes can catch this and clean up. Hung processes may need SIGKILL (AutoSys handles this escalation after a timeout, typically 30 seconds).

kill_job.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Kill a running job
sendevent -E KILLJOB -J hung_etl_job

# After killing, check status
autorep -J hung_etl_job

# Check if any downstream jobs are blocked
sendevent -E LIST_DEPENDENTS -J hung_etl_job

# If you need downstream jobs to proceed after kill:
# First kill the job, then manually mark it as success
sendevent -E KILLJOB -J hung_etl_job
sendevent -E CHANGE_STATUS -J hung_etl_job -s SUCCESS

# Check downstream jobs are now unblocked
autorep -J downstream_job
Output
Job Name ST Exit
hung_etl_job TE -- <- TERMINATED after KILLJOB
Dependents:
downstream_job: waiting on success(hung_etl_job) → condition false
After CHANGE_STATUS SUCCESS:
downstream_job: condition met → job will start normally
KILLJOB vs term_run_time
KILLJOB is a manual kill sent by an operator. term_run_time is an automatic kill triggered by AutoSys when a job exceeds its maximum runtime. Both result in TERMINATED status. The difference is who or what initiated the kill.
Production Insight
A nightly ETL job hung at 3 AM. The on-call engineer killed it with KILLJOB. The job moved to TERMINATED. Five downstream jobs were waiting on success(etl_job). They never started. At 8 AM, the dashboard was empty. The team manually ran the downstream jobs, but the ETL hadn't completed — they ran on stale data.
The mistake: Killing the job doesn't complete the work. The engineer should have investigated why it hung, fixed the root cause (a database lock), then restarted the ETL job properly.
If you MUST unblock downstream without rerunning the killed job, use CHANGE_STATUS SUCCESS after the kill. But this is a bandage — the data may still be incomplete.
Diagnosis: sendevent -E LIST_DEPENDENTS -J killed_job shows what's blocked.
Rule: KILLJOB terminates the process. It doesn't complete the work. Fix the root cause, then rerun properly. CHANGE_STATUS is for emergencies only.
Key Takeaway
KILLJOB → TERMINATED status. Downstream success() won't trigger.
Use CHANGE_STATUS SUCCESS after kill to unblock dependencies (emergencies only).
LIST_DEPENDENTS shows what's blocked before you kill.
TERMINATED ≠ FAILURE ≠ SUCCESS. Know the difference.

RESTART — retrying a failed job

The RESTART event tells AutoSys to rerun a job that is in FAILURE or TERMINATED status. It's cleaner than FORCE_STARTJOB for rerunning failed jobs because it signals intent as a retry in audit logs.

Key difference from FORCE_STARTJOB: RESTART works only on FAILURE or TERMINATED jobs. FORCE_STARTJOB works on any non-running state. RESTART also respects that this is a retry — some AutoSys configurations treat retries differently for alerting purposes.

RESTART does NOT bypass conditions. The job still needs its start conditions satisfied (unless they were the reason it failed — then you have a cycle).

restart_job.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Restart a failed job after fixing the root cause
sendevent -E RESTART -J failed_extract_job

# Before restarting, check WHY it failed
autorep -J failed_extract_job -L 10

# Check if the fix actually worked (test mode)
autorep -J failed_extract_job -q | grep command
# Manually run the command on the agent to verify

# Pattern: check for failures and restart them (for transient failures only)
autorep -J % -s FA | awk '{print $1}' | while read job; do
  echo "Restarting failed job: $job"
  sendevent -E RESTART -J "$job"
done

# For TERMINATED jobs (killed), RESTART also works
sendevent -E RESTART -J killed_job
Output
Restarting failed job: extract_trades
Restarting failed job: load_positions
/* Job extract_trades: FAILURE → STARTING → RUNNING */
/* Restart treated as a new run, not a continuation */
Always check the failure reason before RESTART
RESTARTing without fixing the root cause just fails again. Use autorep -J JOBNAME -L 10 to see the last failure's error log. If it's a transient issue (network timeout), RESTART works. If it's a data issue (missing file), RESTART won't help.
Production Insight
An engineer restarted a failed job 6 times over 2 hours. Each restart failed in 30 seconds. The root cause was a missing directory that the upstream team needed to create. The restarts were pointless and flooded the logs.
The fix: Check the error log first. If the error is transient (connection timeout, temporary lock), RESTART is appropriate. If it's permanent (missing file, permission denied, syntax error), RESTART just creates noise.
Better approach: After a failure, autorep -J JOB -L 5. grep for error keywords. If 'No such file' or 'Permission denied', do NOT restart — fix the root cause first.
Rule: RESTART is for TRANSIENT failures. For permanent failures, fix first, then RESTART once.
Key Takeaway
RESTART works on FAILURE or TERMINATED jobs only.
Check failure reason first — autorep -L 10 shows the error.
RESTART is for retries, not first runs. Use FORCE_STARTJOB for schedule overrides.
RESTART respects conditions. It's not a magic bypass.

CHANGE_STATUS — manual status override

CHANGE_STATUS is the nuclear option. It manually sets a job's status in the Event Server without running anything. This is how you unblock workflows when a job can't or shouldn't be rerun.

Most common use: A job was killed (KILLJOB) but the work was already done. No point rerunning. Set its status to SUCCESS to fire downstream success() conditions.

Risks: CHANGE_STATUS bypasses ALL validation. No agent communication. No process verification. You're telling AutoSys 'trust me, this job is in this state'. If you're wrong, downstream processes run on incorrect assumptions.

change_status.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# After a KILLJOB, mark it as SUCCESS to unblock downstream
sendevent -E CHANGE_STATUS -J hung_job -s SUCCESS

# Manually mark a job as FAILURE (e.g., if you see it's going to fail)
sendevent -E CHANGE_STATUS -J running_job -s FAILURE

# Mark a job as INACTIVE to prevent it from running
sendevent -E CHANGE_STATUS -J job_on_hold -s INACTIVE

# Verify the status change took effect
autorep -J hung_job -q | grep status

# Dangerous: Mark a job as SUCCESS that never ran
# Only do this if you have verified the work was done elsewhere
sendevent -E CHANGE_STATUS -J never_ran_job -s SUCCESS
Output
/* Job status changed from TE (TERMINATED) to SU (SUCCESS) */
/* Downstream jobs with success(hung_job) now evaluate as true */
/* No actual process ran — you are asserting correctness */
CHANGE_STATUS is a manual override — use sparingly
Every CHANGE_STATUS SUCCESS on a job that didn't actually succeed is a potential data corruption event. Document every use. Audit change logs weekly. If you're using it more than once per month, you have a deeper reliability problem.
Production Insight
A team had a job that wrote a file and then a downstream job processed it. The upstream job completed SUCCESS. The file was on disk. The file watcher downstream didn't trigger due to a run_window misconfiguration. The team needed the downstream to run now.
Instead of fixing the run_window (would require update_job, JIL change, tested in non-prod), they used CHANGE_STATUS on the upstream job. But the upstream was already SUCCESS. That did nothing.
They then tried CHANGE_STATUS on the downstream job from RUNNING to SUCCESS, but downstream was INACTIVE (waiting on file). CHANGE_STATUS doesn't trigger file detection.
Two hours of confusion later, they fixed the run_window and the workflow completed.
Lesson: CHANGE_STATUS doesn't magically trigger dependencies. It just changes the stored status. A file watcher won't trigger because you changed its status. A condition won't re-evaluate because you changed an upstream status — you need a new event (like a job completing) to trigger re-evaluation.
After CHANGE_STATUS, you may need to trigger the dependent job manually with FORCE_STARTJOB.
Key Takeaway
CHANGE_STATUS manually sets job status. No agent call. No verification.
Use after KILLJOB to unblock downstream when work is actually done.
Does NOT trigger dependency re-evaluation automatically.
Document every use. Frequent use = process problem, not tool problem.
● Production incidentPOST-MORTEMseverity: high

The FORCE_STARTJOB That Corrupted the Ledger

Symptom
Job completed SUCCESS. All downstream jobs ran normally. No errors in any AutoSys logs. The final output was wrong — incorrect aggregations, missing transactions. Manual reconciliation took three days to identify the root cause.
Assumption
The engineer assumed the job only had time-based conditions (start_times). FORCE_STARTJOB would just override the schedule. They didn't check the full JIL for condition dependencies.
Root cause
The reporting job had condition: success(validate_ledger) AND success(extract_transactions). The validate_ledger job had failed earlier. The on-call engineer saw the reporting job in INACTIVE status and used FORCE_STARTJOB. FORCE_STARTJOB bypassed BOTH conditions. The job ran without validated ledger data. The output was based on incomplete extracts. No alarm fired because the job succeeded. The team didn't know about the condition dependency because they only looked at autorep -q, which shows conditions but not in an obvious way.
Fix
1. Never FORCE_STARTJOB a job without running autorep -J jobname -q | grep condition first. 2. For failed dependencies, fix and restart the dependency chain, not the leaf job. 3. Add box_terminator on validation jobs so the box stops before leaf jobs can be force-started externally. 4. Create an audit script that logs all FORCE_STARTJOB events and flags any that bypassed conditions.
Key lesson
  • FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.
  • Always check autorep -q | grep condition before force-starting.
  • Fixing a leaf job without fixing its dependencies propagates corruption.
  • Force-start is for schedule overrides, not dependency bypasses.
Production debug guideWhat to run when jobs won't start or won't die4 entries
Symptom · 01
Job stuck in RUNNING, need to terminate it
Fix
Use KILLJOB. Then check if downstream jobs are blocked on success(). If yes, use CHANGE_STATUS -s SUCCESS after kill. Verify with autorep -d.
Symptom · 02
Job won't start — INACTIVE or condition not met, but you need output now
Fix
First check dependencies: autorep -J JOB -q | grep condition. If dependencies exist, fix them first. Otherwise use FORCE_STARTJOB.
Symptom · 03
Job failed on transient error, you fixed the cause, need to retry
Fix
Use RESTART (not FORCE_STARTJOB). RESTART is semantically a retry and respects that the job previously failed. Check logs first: autorep -J JOB -L 5.
Symptom · 04
Downstream jobs waiting on a job that was killed or manually fixed
Fix
Use CHANGE_STATUS -s SUCCESS on the upstream job. This fires success() conditions without rerunning the job. Verify autorep shows SUCCESS before downstream runs.
★ sendevent — 60-Second Emergency ReferenceRun these commands when normal scheduling fails and you need to intervene
Job won't start — need output now regardless of conditions
Immediate action
Check conditions first, then force-start
Commands
autorep -J JOBNAME -q | grep -E 'condition|date_conditions|start_times'
sendevent -E FORCE_STARTJOB -J JOBNAME
Fix now
sendevent -E FORCE_STARTJOB -J JOBNAME
Job hung — needs termination+
Immediate action
Kill the job, then unblock downstream
Commands
sendevent -E KILLJOB -J JOBNAME
sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS
Fix now
sendevent -E KILLJOB -J JOBNAME && sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS
Job failed after transient error — need retry+
Immediate action
Verify failure reason, then restart
Commands
autorep -J JOBNAME -L 10 | grep FAILURE
sendevent -E RESTART -J JOBNAME
Fix now
sendevent -E RESTART -J JOBNAME
Downstream blocked on job that won't rerun+
Immediate action
Manually set upstream to SUCCESS
Commands
autorep -J UPSTREAM_JOB -q | grep status
sendevent -E CHANGE_STATUS -J UPSTREAM_JOB -s SUCCESS
Fix now
sendevent -E CHANGE_STATUS -J UPSTREAM_JOB -s SUCCESS
sendevent Commands — When to Use Which
EventRespects conditions?Works on statusWhat actually happensTypical use
FORCE_STARTJOBNo — bypasses allAny non-running stateJob starts immediately, no condition checksRun job outside its schedule (fixing missed window)
STARTJOBYesINACTIVE / ACTIVATEDJob starts only if time and conditions are metTrigger after fixing a condition (rarely used)
RESTARTYes (transient failures only)FAILURE / TERMINATEDReruns failed job, respects that it's a retryRerun after fixing a transient failure
KILLJOBN/ARUNNING onlySIGTERM to agent process → TERMINATED statusTerminate hung or infinite-loop job
CHANGE_STATUS + SUCCESSN/AAnyManually sets status without running anythingUnblock downstream after manual verification

Key takeaways

1
FORCE_STARTJOB bypasses ALL conditions
time AND dependencies. Fix dependencies first.
2
KILLJOB → TERMINATED. Downstream success() won't fire. Use CHANGE_STATUS to unblock if work was done.
3
RESTART is for retrying FAILED/TERMINATED jobs after transient failures. Check failure reason first.
4
CHANGE_STATUS is a manual override that doesn't trigger dependency re-evaluation automatically.
5
Always check conditions before force-starting
autorep -J JOB -q | grep condition.
6
Document every manual intervention. If you're using these commands weekly, your automation is broken.

Common mistakes to avoid

5 patterns
×

Force-starting a job without checking conditions first

Symptom
Job runs but produces corrupt output because upstream dependencies were skipped. Downstream processes continue on bad data. Error is caught days later during reconciliation.
Fix
Always run autorep -J JOB -q | grep condition before FORCE_STARTJOB. If conditions exist, fix the dependency chain instead. Use RESTART on failed dependencies, then let the original job start naturally.
×

Killing a job but not unblocking downstream dependencies

Symptom
Killed job shows TERMINATED. Downstream jobs stay INACTIVE waiting for success(). Team manually reruns downstream jobs but missing the killed job's work.
Fix
After KILLJOB, run sendevent -E LIST_DEPENDENTS -J killed_job. If downstream jobs are waiting, either: - RESTART the killed job (if work wasn't done) - CHANGE_STATUS to SUCCESS (if work was done elsewhere)
×

Using FORCE_STARTJOB when RESTART would be appropriate

Symptom
Job fails transiently. Engineer force-starts instead of restarting. Audit logs show FORCE_STARTJOB on a FAILED job, making it look like an emergency bypass when it was just a retry.
Fix
Use RESTART for retrying failed jobs. It signals intent correctly in logs and respects that this is a retry (some alerting systems treat RESTART differently from FORCE_STARTJOB).
×

Assuming CHANGE_STATUS triggers dependency re-evaluation

Symptom
Engineer changes a job from FAILURE to SUCCESS. Downstream jobs with success(upstream) still don't start. Team assumes AutoSys is broken.
Fix
CHANGE_STATUS changes the stored status but does NOT trigger a full dependency re-evaluation. After CHANGE_STATUS, either: - Send a dummy event to trigger re-evaluation (e.g., sendevent -E JOB_STATUS_CHANGED) - Force-start the downstream job directly - For critical paths, the upstream job may need to actually rerun
×

Using FORCE_STARTJOB on a child job inside a non-running BOX

Symptom
Force-started child job runs, but the BOX still shows INACTIVE or FAILURE. Downstream jobs that depend on the BOX never run because the BOX never succeeded.
Fix
Force-start the BOX, not the child. The child will start naturally if its conditions are met. If you must run only the child, the BOX state will be inconsistent — consider moving the job outside the BOX for manual runs.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between FORCE_STARTJOB and STARTJOB?
Q02JUNIOR
What happens to a job's status after KILLJOB?
Q03SENIOR
What is the difference between RESTART and FORCE_STARTJOB for a failed j...
Q04SENIOR
How would you unblock dependent jobs after a job was killed?
Q05JUNIOR
If term_run_time terminates a job, what status does it move to?
Q06SENIOR
What are the risks of using CHANGE_STATUS to mark a job as SUCCESS when ...
Q01 of 06JUNIOR

What is the difference between FORCE_STARTJOB and STARTJOB?

ANSWER
FORCE_STARTJOB starts the job immediately, bypassing ALL starting conditions including date_conditions, start_times, and condition dependencies. STARTJOB respects conditions — the job will only start if its time schedule and dependency conditions are all true. In practice, STARTJOB is rarely used because if conditions are already true, the job would have started on its own. FORCE_STARTJOB is the emergency override. STARTJOB might be used after manually fixing a condition that AutoSys hasn't re-evaluated yet.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
How do I force start an AutoSys job from the command line?
02
What happens after KILLJOB in AutoSys?
03
What is the difference between RESTART and FORCE_STARTJOB?
04
Can I force start a job that is inside a BOX?
05
What status does a job have after term_run_time kills it?
06
Does CHANGE_STATUS automatically trigger downstream jobs?
COMPLETE GUIDE
The Complete AutoSys Workload Automation Guide for Engineers →

JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.

🔥

That's AutoSys. Mark it forged?

3 min read · try the examples if you haven't

Previous
ON HOLD vs ON ICE in AutoSys
22 / 30 · AutoSys
Next
AutoSys Monitoring with WCC