Senior 4 min · March 19, 2026
Force Start and Kill Job in AutoSys

AutoSys FORCE_STARTJOB — Condition Bypass Corrupts Data

FORCE_STARTJOB bypassed validate_ledger and extract_transactions conditions.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • FORCE_STARTJOB: runs job immediately, bypasses ALL conditions (time + dependencies). Use when you need output now, but know what you're skipping.
  • KILLJOB: terminates running job → TERMINATED status. Downstream success() conditions won't fire. Use CHANGE_STATUS SUCCESS afterward to unblock.
  • RESTART: retry a FAILED or TERMINATED job. Cleaner than FORCE_STARTJOB for retries — audit logs show intent.
  • STARTJOB: respects conditions. Job starts only if its time + dependency gates are open. Practically useless in emergencies.
  • Production rule: FORCE_STARTJOB without understanding dependencies corrupts data. KILLJOB without CHANGE_STATUS blocks workflows for hours.
✦ Definition~90s read
What is Force Start and Kill Job in AutoSys?

AutoSys is an enterprise job scheduling platform (by Broadcom/CA) that orchestrates batch workflows across distributed systems. Its core value is dependency-driven execution: jobs start only when upstream jobs succeed, conditions like exit codes are met, or time windows open.

Force starting a job is like overriding the traffic light and going anyway.

This deterministic chain is what prevents data corruption in pipelines — a downstream ETL job, for example, should never run before its source data is fully staged. When you bypass these safeguards, you're not just skipping a check; you're breaking the contract that keeps your data consistent.

FORCE_STARTJOB is a manual override that starts a job immediately, ignoring all conditions — predecessor statuses, calendar rules, resource locks, and even running instances. It's the equivalent of telling a database 'commit anyway' after a constraint violation.

In practice, this means you can launch a job that expects data from a still-running upstream process, or one that writes to a file currently being read by another job. The result is partial writes, duplicate records, or corrupted state that cascades silently through downstream dependencies.

KILLJOB terminates a running job abruptly — no cleanup, no rollback, no signal to dependent jobs. Combined with FORCE_STARTJOB, you get a race condition factory.

RESTART and CHANGE_STATUS are slightly less destructive but still dangerous when misused. RESTART re-executes a failed job from scratch, which is fine if the job is idempotent — but many aren't. CHANGE_STATUS lets you manually set a job to SUCCESS or FAILURE, effectively lying to the scheduler.

This can trick downstream jobs into starting prematurely or skipping entirely. The incident report you'll inevitably write after using these commands will detail exactly which file got truncated, which table got double-loaded, and which downstream process failed at 3 AM because its input was half-written.

The fix is always the same: let the scheduler do its job, or build proper idempotency and retry logic into your jobs instead of reaching for the manual override.

Plain-English First

Force starting a job is like overriding the traffic light and going anyway. Killing a job is like hitting the emergency stop button. Restarting is like pressing the retry button after a failure. These are your emergency controls for when the normal flow needs intervention.

Production AutoSys environments need manual intervention. Jobs hang. Downstream dependencies get stuck. A fix deploys and you need to rerun a failed job at 2 AM.

Knowing the exact sendevent command is table stakes. Knowing what happens after — that's the senior engineer difference.

FORCE_STARTJOB bypasses conditions. KILLJOB leaves downstream jobs waiting. RESTART is for retries, not first runs. This article covers the side effects that incident post-mortems reveal.

How AutoSys FORCE_STARTJOB Breaks Job Dependencies

FORCE_STARTJOB is an AutoSys command that starts a job immediately, bypassing all upstream conditions, box dependencies, and calendar rules. It ignores the job's defined starting conditions entirely — no waiting for predecessor success, no checking for box status, no respecting time windows. The job runs as if all conditions are met, regardless of reality.

When you issue sendevent -E FORCE_STARTJOB -J job_name, AutoSys sets the job's status to STARTING and launches it. The job's exit code still updates its status (SUCCESS/FAILURE), but the event log records the forced start. Downstream jobs that depend on this job will see the status change and may trigger, even though the data the job processed might be stale or incomplete because upstream dependencies were skipped.

Use FORCE_STARTJOB only when you have manually verified that all upstream data is ready and consistent — typically during disaster recovery or after manual data reconciliation. In production, teams often reach for it to unstick a stalled pipeline, but the bypass corrupts data integrity because downstream consumers see a success signal without the actual dependency chain being satisfied.

Bypass ≠ Fix
FORCE_STARTJOB does not fix the root cause of a stalled job — it only masks the symptom by skipping validation.
Production Insight
A team used FORCE_STARTJOB on a daily ETL load job after its upstream feed failed to arrive. The job ran against an empty source, wrote zero rows to the warehouse, and reported SUCCESS. Downstream dashboards showed no data for 8 hours before the silent failure was caught.
Symptom: downstream reports show zeros or stale data, but all jobs show SUCCESS status.
Rule: never FORCE_STARTJOB unless you have independently verified the upstream data — check file timestamps, row counts, or source system status first.
Key Takeaway
FORCE_STARTJOB bypasses all conditions — treat it as a manual override, not a recovery tool.
A forced job's SUCCESS status is a lie if upstream data is missing or incomplete.
Always validate data integrity after a forced start before allowing downstream jobs to run.
Manual Job Control Flow Manual Job Control Flow. When and how to intervene · Job in FAILURE / stuck · conditions met but job not running · Read error log FIRST · cat std_err_file — never skip this · Fix the root cause THECODEFORGE.IOManual Job Control FlowWhen and how to intervene Job in FAILURE / stuckconditions met but job not running Read error log FIRSTcat std_err_file — never skip this Fix the root causecode fix, space cleared, DB restarted RESTART or FORCE_STARTJOBRESTART for failed, FORCE for bypass Monitor to SUCCESSwatch autorep -J jobname Verify actual outputcheck file/row count — not just statusTHECODEFORGE.IO
thecodeforge.io
Manual Job Control Flow
Autosys Force Start Kill Job

FORCE_STARTJOB — bypassing all conditions

FORCE_STARTJOB immediately starts a job regardless of its date_conditions, start_times, or condition dependencies. It's the 'run it now, no questions asked' command.

Critical nuance: FORCE_STARTJOB bypasses EVERYTHING. Not just the schedule. Not just the time gates. Also any condition: success(other_job) dependencies. The job runs even if its upstream dependencies never ran or failed.

This is the most dangerous sendevent command. Use it only when you fully understand what conditions exist on the job.

force_start.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Force start a single job immediately
sendevent -E FORCE_STARTJOB -J daily_report

# Force start a BOX (starts the box, inner jobs still follow their conditions)
sendevent -E FORCE_STARTJOB -J eod_processing_box

# Force start on a specific date (run it as if it were that date)
sendevent -E FORCE_STARTJOB -J daily_report -q 20260319

# Check what conditions you're about to bypass — ALWAYS do this first
autorep -J daily_report -q | grep condition

# Check current status
autorep -J daily_report
Output
/* Before force-start: check conditions */
condition: success(extract_trades) AND success(validate_positions)
/* Event sent: FORCE_STARTJOB for daily_report */
/* Job daily_report: STARTING → RUNNING (02:00:01) */
/* Both dependencies were skipped entirely */
Bypassed conditions have consequences
FORCE_STARTJOB skips ALL conditions, including dependency conditions. If daily_report depends on extract_job completing first, force-starting it means it runs without the extract data. Make sure you understand what conditions you're bypassing.
Production Insight
A team force-started a billing job at 2 AM after a database timeout. The job ran, generated invoices, and sent them to customers. Three days later, the finance team noticed duplicate invoices. The billing job had a condition: success(settlement_validation). The validation job had failed at 1:55 AM. The force-start bypassed it completely.
The fix: Never force-start a job with dependencies. Fix the dependency chain instead. Use RESTART on the failed validation job, let it succeed, then the original job will start normally through conditions.
Diagnosis: autorep -J jobname -q | grep condition shows the dependencies. If non-empty, fix them first.
Rule: FORCE_STARTJOB is for schedule overrides, not dependency bypasses.
Key Takeaway
FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.
Always check conditions first: autorep -J JOB -q | grep condition.
Fix failed dependencies, don't skip them. Force-start propagates corruption.
For boxes, start the box, not the child.
Should you FORCE_STARTJOB or fix dependencies first?
IfJob has condition dependencies that are FAILED or never ran
UseDo NOT force-start. Fix the dependencies first (RESTART or rerun), let conditions trigger naturally.
IfJob has only time-based conditions (start_times, days_of_week)
UseFORCE_STARTJOB is safe — you're just overriding the schedule.
IfJob has no conditions at all
UseFORCE_STARTJOB works, but why isn't the job running? Check date_conditions and start_times first.
IfJob is inside a BOX that's not RUNNING
UseFORCE_STARTJOB the BOX, not the inner job. Force-starting an inner job without the box running causes inconsistent state.

KILLJOB — terminating a running job

KILLJOB sends a termination signal to the process running on the agent machine. The job moves to TERMINATED status. Any downstream jobs waiting on success() of this job will not start.

Critical nuance: TERMINATED is NOT FAILURE. It's a separate status. A job that's killed doesn't trigger success() conditions, but it also doesn't trigger failure() conditions unless you explicitly check for TERMINATED.

After KILLJOB, the process receives SIGTERM. Well-behaved processes can catch this and clean up. Hung processes may need SIGKILL (AutoSys handles this escalation after a timeout, typically 30 seconds).

kill_job.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Kill a running job
sendevent -E KILLJOB -J hung_etl_job

# After killing, check status
autorep -J hung_etl_job

# Check if any downstream jobs are blocked
sendevent -E LIST_DEPENDENTS -J hung_etl_job

# If you need downstream jobs to proceed after kill:
# First kill the job, then manually mark it as success
sendevent -E KILLJOB -J hung_etl_job
sendevent -E CHANGE_STATUS -J hung_etl_job -s SUCCESS

# Check downstream jobs are now unblocked
autorep -J downstream_job
Output
Job Name ST Exit
hung_etl_job TE -- <- TERMINATED after KILLJOB
Dependents:
downstream_job: waiting on success(hung_etl_job) → condition false
After CHANGE_STATUS SUCCESS:
downstream_job: condition met → job will start normally
KILLJOB vs term_run_time
KILLJOB is a manual kill sent by an operator. term_run_time is an automatic kill triggered by AutoSys when a job exceeds its maximum runtime. Both result in TERMINATED status. The difference is who or what initiated the kill.
Production Insight
A nightly ETL job hung at 3 AM. The on-call engineer killed it with KILLJOB. The job moved to TERMINATED. Five downstream jobs were waiting on success(etl_job). They never started. At 8 AM, the dashboard was empty. The team manually ran the downstream jobs, but the ETL hadn't completed — they ran on stale data.
The mistake: Killing the job doesn't complete the work. The engineer should have investigated why it hung, fixed the root cause (a database lock), then restarted the ETL job properly.
If you MUST unblock downstream without rerunning the killed job, use CHANGE_STATUS SUCCESS after the kill. But this is a bandage — the data may still be incomplete.
Diagnosis: sendevent -E LIST_DEPENDENTS -J killed_job shows what's blocked.
Rule: KILLJOB terminates the process. It doesn't complete the work. Fix the root cause, then rerun properly. CHANGE_STATUS is for emergencies only.
Key Takeaway
KILLJOB → TERMINATED status. Downstream success() won't trigger.
Use CHANGE_STATUS SUCCESS after kill to unblock dependencies (emergencies only).
LIST_DEPENDENTS shows what's blocked before you kill.
TERMINATED ≠ FAILURE ≠ SUCCESS. Know the difference.

RESTART — retrying a failed job

The RESTART event tells AutoSys to rerun a job that is in FAILURE or TERMINATED status. It's cleaner than FORCE_STARTJOB for rerunning failed jobs because it signals intent as a retry in audit logs.

Key difference from FORCE_STARTJOB: RESTART works only on FAILURE or TERMINATED jobs. FORCE_STARTJOB works on any non-running state. RESTART also respects that this is a retry — some AutoSys configurations treat retries differently for alerting purposes.

RESTART does NOT bypass conditions. The job still needs its start conditions satisfied (unless they were the reason it failed — then you have a cycle).

restart_job.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Restart a failed job after fixing the root cause
sendevent -E RESTART -J failed_extract_job

# Before restarting, check WHY it failed
autorep -J failed_extract_job -L 10

# Check if the fix actually worked (test mode)
autorep -J failed_extract_job -q | grep command
# Manually run the command on the agent to verify

# Pattern: check for failures and restart them (for transient failures only)
autorep -J % -s FA | awk '{print $1}' | while read job; do
  echo "Restarting failed job: $job"
  sendevent -E RESTART -J "$job"
done

# For TERMINATED jobs (killed), RESTART also works
sendevent -E RESTART -J killed_job
Output
Restarting failed job: extract_trades
Restarting failed job: load_positions
/* Job extract_trades: FAILURE → STARTING → RUNNING */
/* Restart treated as a new run, not a continuation */
Always check the failure reason before RESTART
RESTARTing without fixing the root cause just fails again. Use autorep -J JOBNAME -L 10 to see the last failure's error log. If it's a transient issue (network timeout), RESTART works. If it's a data issue (missing file), RESTART won't help.
Production Insight
An engineer restarted a failed job 6 times over 2 hours. Each restart failed in 30 seconds. The root cause was a missing directory that the upstream team needed to create. The restarts were pointless and flooded the logs.
The fix: Check the error log first. If the error is transient (connection timeout, temporary lock), RESTART is appropriate. If it's permanent (missing file, permission denied, syntax error), RESTART just creates noise.
Better approach: After a failure, autorep -J JOB -L 5. grep for error keywords. If 'No such file' or 'Permission denied', do NOT restart — fix the root cause first.
Rule: RESTART is for TRANSIENT failures. For permanent failures, fix first, then RESTART once.
Key Takeaway
RESTART works on FAILURE or TERMINATED jobs only.
Check failure reason first — autorep -L 10 shows the error.
RESTART is for retries, not first runs. Use FORCE_STARTJOB for schedule overrides.
RESTART respects conditions. It's not a magic bypass.

CHANGE_STATUS — manual status override

CHANGE_STATUS is the nuclear option. It manually sets a job's status in the Event Server without running anything. This is how you unblock workflows when a job can't or shouldn't be rerun.

Most common use: A job was killed (KILLJOB) but the work was already done. No point rerunning. Set its status to SUCCESS to fire downstream success() conditions.

Risks: CHANGE_STATUS bypasses ALL validation. No agent communication. No process verification. You're telling AutoSys 'trust me, this job is in this state'. If you're wrong, downstream processes run on incorrect assumptions.

change_status.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# After a KILLJOB, mark it as SUCCESS to unblock downstream
sendevent -E CHANGE_STATUS -J hung_job -s SUCCESS

# Manually mark a job as FAILURE (e.g., if you see it's going to fail)
sendevent -E CHANGE_STATUS -J running_job -s FAILURE

# Mark a job as INACTIVE to prevent it from running
sendevent -E CHANGE_STATUS -J job_on_hold -s INACTIVE

# Verify the status change took effect
autorep -J hung_job -q | grep status

# Dangerous: Mark a job as SUCCESS that never ran
# Only do this if you have verified the work was done elsewhere
sendevent -E CHANGE_STATUS -J never_ran_job -s SUCCESS
Output
/* Job status changed from TE (TERMINATED) to SU (SUCCESS) */
/* Downstream jobs with success(hung_job) now evaluate as true */
/* No actual process ran — you are asserting correctness */
CHANGE_STATUS is a manual override — use sparingly
Every CHANGE_STATUS SUCCESS on a job that didn't actually succeed is a potential data corruption event. Document every use. Audit change logs weekly. If you're using it more than once per month, you have a deeper reliability problem.
Production Insight
A team had a job that wrote a file and then a downstream job processed it. The upstream job completed SUCCESS. The file was on disk. The file watcher downstream didn't trigger due to a run_window misconfiguration. The team needed the downstream to run now.
Instead of fixing the run_window (would require update_job, JIL change, tested in non-prod), they used CHANGE_STATUS on the upstream job. But the upstream was already SUCCESS. That did nothing.
They then tried CHANGE_STATUS on the downstream job from RUNNING to SUCCESS, but downstream was INACTIVE (waiting on file). CHANGE_STATUS doesn't trigger file detection.
Two hours of confusion later, they fixed the run_window and the workflow completed.
Lesson: CHANGE_STATUS doesn't magically trigger dependencies. It just changes the stored status. A file watcher won't trigger because you changed its status. A condition won't re-evaluate because you changed an upstream status — you need a new event (like a job completing) to trigger re-evaluation.
After CHANGE_STATUS, you may need to trigger the dependent job manually with FORCE_STARTJOB.
Key Takeaway
CHANGE_STATUS manually sets job status. No agent call. No verification.
Use after KILLJOB to unblock downstream when work is actually done.
Does NOT trigger dependency re-evaluation automatically.
Document every use. Frequent use = process problem, not tool problem.

AutoSys Job Recovery: The Incident Report You Weren't Expecting

You force-started a job at 3 AM to unblock the batch pipeline. By 5 AM, five downstream jobs had silently failed because their box conditions never evaluated. The scheduler saw the job as a manual override and skipped all the dependency checks you'd spent weeks building. The production incident report won't mention your name, but your teammates will remember.

When you use FORCE_STARTJOB, AutoSys flags the job with a special attribute that tells the scheduler: "This run is a dirty override — don't trust its completion for dependency resolution." The job runs, exits zero, but its box conditions and success codes are ignored by downstream jobs. The system treats it like a ghost run. That's why intelligent operators build a recovery job — a wrapper that checks the job's exit status and manually updates the downstream conditions using CHANGE_STATUS after verifying the job actually did its work.

The recovery pattern is simple: after the force start completes, run a secondary process that validates the output (file landed, DB updated, API responded), then explicitly sets the condition in the downstream box. Your control room will thank you when the morning batch doesn't implode.

ForceStartRecovery.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — devops tutorial

// Recovery job to fix downstream dependencies after FORCE_STARTJOB
// Run this immediately after the forced job completes

insert_job: RECVRY_JOB_PAYMENT_PROCESSOR
job_type: CMD
command: /opt/autosys/recovery/validate_and_set_condition.sh
machine: payment-prod-01
description: "validate forced job output then set SUCCESS flag in box"
conditions: s(PAYMENT_PROCESSOR_DUMP)  # start when forced job finishes
start_times: "00:00"  # no time restriction — triggered by dependency
daily_run: n
timezone: America/New_York
profile: /home/autosys/.profile
Output
Job: RECVRY_JOB_PAYMENT_PROCESSOR submitted.
Triggered by: s(PAYMENT_PROCESSOR_DUMP)
Status: ACTIVATED — waiting for forced job to complete.
Production Trap:
Never force-start a job inside a complex box without immediately scheduling a cleanup job. The downstream box condition is broken until you manually fix it — the scheduler won't magically recover.
Key Takeaway
FORCE_STARTJOB breaks downstream dependencies. Always run a validation + condition-fix wrapper after any forced job.

KILLJOB Deep Cut: Graceful vs. Slaughter Mode

Standard KILLJOB sends SIGTERM. That's the polite wave that says "please finish your transaction and exit." But when you've got a runaway Java process chewing through 16 GB of heap and ignoring polite requests, you need the nuclear option. AutoSys gives you that with KILLJOB /SIGKILL = 1 — but the documentation hides the consequences.

When you send SIGKILL, AutoSys immediately marks the job as KILLED in the scheduler. The process might still be alive for a few seconds before the OS terminates it. But here's the trap: if your job had open file handles (database connections, temporary files, NFS mounts), those get orphaned. The DB connection pool will leak. Temp files stay behind and fill up disk. NFS mounts stay locked.

Senior engineers build a two-stage kill procedure into their runbooks: first KILLJOB (SIGTERM), wait 30 seconds, check autorep -j JOBNAME -w 10 for the KILLED status. If it's still RUNNING, then fire KILLJOB /SIGKILL = 1. After that, run a cleanup script that checks for orphaned temp files by PID, kills any residual processes by PGID, and logs the incident with a timestamp for your SRE dashboard. Do this right once, and the 2 AM page becomes a 5-minute fix instead of a 3-hour war room.

GracefulKillProcedure.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — devops tutorial

// Two-stage kill procedure for production jobs
// Stage 1: polite kill
insert_job: KILL_PAYMENT_BATCH
job_type: CMD
command: |
  autorep -j PAYMENT_BATCH -w 10 | grep RUNNING &&
  sendevent -E KILLJOB -J PAYMENT_BATCH &&
  echo "SIGTERM sent at $(date)" >> /tmp/kill_log_
box_name: BOX_WEBHOOKS_PROCESSING
description: "two-stage kill with cleanup"

// Stage 2: run after 30 sec delay
insert_job: FORCE_KILL_PAYMENT_BATCH
job_type: CMD
command: |
  sleep 30
  if autorep -j PAYMENT_BATCH -w 10 | grep -q RUNNING; then
    sendevent -E KILLJOB -J PAYMENT_BATCH /SIGKILL=1
    echo "SIGKILL sent at $(date)" >> /tmp/kill_log_
    kill -9 $(ps -ef | grep payment_batch | awk '{print $2}') 2>/dev/null
  fi
  /opt/autosys/cleanup/orphaned_files.sh -j PAYMENT_BATCH
box_name: BOX_WEBHOOKS_PROCESSING
Output
=== Stage 1 ===
PAYMENT_BATCH status: RUNNING
Event: KILLJOB sent to PAYMENT_BATCH
=== Stage 2 (30 sec later) ===
PAYMENT_BATCH status: RUNNING (still alive)
Event: KILLJOB sent to PAYMENT_BATCH /SIGKILL=1
Cleanup script found 2 orphaned temp files, removed.
Process group terminated.
Log written to /tmp/kill_log_.
Senior Shortcut:
Use autorep -j JOBNAME -w 10 to poll job status in your kill scripts. It returns the current status string — grep for RUNNING. Faster than parsing XML.
Key Takeaway
KILLJOB sends SIGTERM first. Only use SIGKILL if process survives 30 seconds. Always clean up orphaned handles afterward.
● Production incidentPOST-MORTEMseverity: high

The FORCE_STARTJOB That Corrupted the Ledger

Symptom
Job completed SUCCESS. All downstream jobs ran normally. No errors in any AutoSys logs. The final output was wrong — incorrect aggregations, missing transactions. Manual reconciliation took three days to identify the root cause.
Assumption
The engineer assumed the job only had time-based conditions (start_times). FORCE_STARTJOB would just override the schedule. They didn't check the full JIL for condition dependencies.
Root cause
The reporting job had condition: success(validate_ledger) AND success(extract_transactions). The validate_ledger job had failed earlier. The on-call engineer saw the reporting job in INACTIVE status and used FORCE_STARTJOB. FORCE_STARTJOB bypassed BOTH conditions. The job ran without validated ledger data. The output was based on incomplete extracts. No alarm fired because the job succeeded. The team didn't know about the condition dependency because they only looked at autorep -q, which shows conditions but not in an obvious way.
Fix
1. Never FORCE_STARTJOB a job without running autorep -J jobname -q | grep condition first. 2. For failed dependencies, fix and restart the dependency chain, not the leaf job. 3. Add box_terminator on validation jobs so the box stops before leaf jobs can be force-started externally. 4. Create an audit script that logs all FORCE_STARTJOB events and flags any that bypassed conditions.
Key lesson
  • FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.
  • Always check autorep -q | grep condition before force-starting.
  • Fixing a leaf job without fixing its dependencies propagates corruption.
  • Force-start is for schedule overrides, not dependency bypasses.
Production debug guideWhat to run when jobs won't start or won't die4 entries
Symptom · 01
Job stuck in RUNNING, need to terminate it
Fix
Use KILLJOB. Then check if downstream jobs are blocked on success(). If yes, use CHANGE_STATUS -s SUCCESS after kill. Verify with autorep -d.
Symptom · 02
Job won't start — INACTIVE or condition not met, but you need output now
Fix
First check dependencies: autorep -J JOB -q | grep condition. If dependencies exist, fix them first. Otherwise use FORCE_STARTJOB.
Symptom · 03
Job failed on transient error, you fixed the cause, need to retry
Fix
Use RESTART (not FORCE_STARTJOB). RESTART is semantically a retry and respects that the job previously failed. Check logs first: autorep -J JOB -L 5.
Symptom · 04
Downstream jobs waiting on a job that was killed or manually fixed
Fix
Use CHANGE_STATUS -s SUCCESS on the upstream job. This fires success() conditions without rerunning the job. Verify autorep shows SUCCESS before downstream runs.
★ sendevent — 60-Second Emergency ReferenceRun these commands when normal scheduling fails and you need to intervene
Job won't start — need output now regardless of conditions
Immediate action
Check conditions first, then force-start
Commands
autorep -J JOBNAME -q | grep -E 'condition|date_conditions|start_times'
sendevent -E FORCE_STARTJOB -J JOBNAME
Fix now
sendevent -E FORCE_STARTJOB -J JOBNAME
Job hung — needs termination+
Immediate action
Kill the job, then unblock downstream
Commands
sendevent -E KILLJOB -J JOBNAME
sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS
Fix now
sendevent -E KILLJOB -J JOBNAME && sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS
Job failed after transient error — need retry+
Immediate action
Verify failure reason, then restart
Commands
autorep -J JOBNAME -L 10 | grep FAILURE
sendevent -E RESTART -J JOBNAME
Fix now
sendevent -E RESTART -J JOBNAME
Downstream blocked on job that won't rerun+
Immediate action
Manually set upstream to SUCCESS
Commands
autorep -J UPSTREAM_JOB -q | grep status
sendevent -E CHANGE_STATUS -J UPSTREAM_JOB -s SUCCESS
Fix now
sendevent -E CHANGE_STATUS -J UPSTREAM_JOB -s SUCCESS
sendevent Commands — When to Use Which
EventRespects conditions?Works on statusWhat actually happensTypical use
FORCE_STARTJOBNo — bypasses allAny non-running stateJob starts immediately, no condition checksRun job outside its schedule (fixing missed window)
STARTJOBYesINACTIVE / ACTIVATEDJob starts only if time and conditions are metTrigger after fixing a condition (rarely used)
RESTARTYes (transient failures only)FAILURE / TERMINATEDReruns failed job, respects that it's a retryRerun after fixing a transient failure
KILLJOBN/ARUNNING onlySIGTERM to agent process → TERMINATED statusTerminate hung or infinite-loop job
CHANGE_STATUS + SUCCESSN/AAnyManually sets status without running anythingUnblock downstream after manual verification

Key takeaways

1
FORCE_STARTJOB bypasses ALL conditions
time AND dependencies. Fix dependencies first.
2
KILLJOB → TERMINATED. Downstream success() won't fire. Use CHANGE_STATUS to unblock if work was done.
3
RESTART is for retrying FAILED/TERMINATED jobs after transient failures. Check failure reason first.
4
CHANGE_STATUS is a manual override that doesn't trigger dependency re-evaluation automatically.
5
Always check conditions before force-starting
autorep -J JOB -q | grep condition.
6
Document every manual intervention. If you're using these commands weekly, your automation is broken.

Common mistakes to avoid

5 patterns
×

Force-starting a job without checking conditions first

Symptom
Job runs but produces corrupt output because upstream dependencies were skipped. Downstream processes continue on bad data. Error is caught days later during reconciliation.
Fix
Always run autorep -J JOB -q | grep condition before FORCE_STARTJOB. If conditions exist, fix the dependency chain instead. Use RESTART on failed dependencies, then let the original job start naturally.
×

Killing a job but not unblocking downstream dependencies

Symptom
Killed job shows TERMINATED. Downstream jobs stay INACTIVE waiting for success(). Team manually reruns downstream jobs but missing the killed job's work.
Fix
After KILLJOB, run sendevent -E LIST_DEPENDENTS -J killed_job. If downstream jobs are waiting, either: - RESTART the killed job (if work wasn't done) - CHANGE_STATUS to SUCCESS (if work was done elsewhere)
×

Using FORCE_STARTJOB when RESTART would be appropriate

Symptom
Job fails transiently. Engineer force-starts instead of restarting. Audit logs show FORCE_STARTJOB on a FAILED job, making it look like an emergency bypass when it was just a retry.
Fix
Use RESTART for retrying failed jobs. It signals intent correctly in logs and respects that this is a retry (some alerting systems treat RESTART differently from FORCE_STARTJOB).
×

Assuming CHANGE_STATUS triggers dependency re-evaluation

Symptom
Engineer changes a job from FAILURE to SUCCESS. Downstream jobs with success(upstream) still don't start. Team assumes AutoSys is broken.
Fix
CHANGE_STATUS changes the stored status but does NOT trigger a full dependency re-evaluation. After CHANGE_STATUS, either: - Send a dummy event to trigger re-evaluation (e.g., sendevent -E JOB_STATUS_CHANGED) - Force-start the downstream job directly - For critical paths, the upstream job may need to actually rerun
×

Using FORCE_STARTJOB on a child job inside a non-running BOX

Symptom
Force-started child job runs, but the BOX still shows INACTIVE or FAILURE. Downstream jobs that depend on the BOX never run because the BOX never succeeded.
Fix
Force-start the BOX, not the child. The child will start naturally if its conditions are met. If you must run only the child, the BOX state will be inconsistent — consider moving the job outside the BOX for manual runs.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between FORCE_STARTJOB and STARTJOB?
Q02JUNIOR
What happens to a job's status after KILLJOB?
Q03SENIOR
What is the difference between RESTART and FORCE_STARTJOB for a failed j...
Q04SENIOR
How would you unblock dependent jobs after a job was killed?
Q05JUNIOR
If term_run_time terminates a job, what status does it move to?
Q06SENIOR
What are the risks of using CHANGE_STATUS to mark a job as SUCCESS when ...
Q01 of 06JUNIOR

What is the difference between FORCE_STARTJOB and STARTJOB?

ANSWER
FORCE_STARTJOB starts the job immediately, bypassing ALL starting conditions including date_conditions, start_times, and condition dependencies. STARTJOB respects conditions — the job will only start if its time schedule and dependency conditions are all true. In practice, STARTJOB is rarely used because if conditions are already true, the job would have started on its own. FORCE_STARTJOB is the emergency override. STARTJOB might be used after manually fixing a condition that AutoSys hasn't re-evaluated yet.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
How do I force start an AutoSys job from the command line?
02
What happens after KILLJOB in AutoSys?
03
What is the difference between RESTART and FORCE_STARTJOB?
04
Can I force start a job that is inside a BOX?
05
What status does a job have after term_run_time kills it?
06
Does CHANGE_STATUS automatically trigger downstream jobs?
COMPLETE GUIDE
The Complete AutoSys Workload Automation Guide for Engineers →

JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's AutoSys. Mark it forged?

4 min read · try the examples if you haven't

Previous
ON HOLD vs ON ICE in AutoSys
22 / 30 · AutoSys
Next
AutoSys Monitoring with WCC