DevOps Intermediate

sendevent: 4 Commands That Fix AutoSys

Q: How do I force start an AutoSys job from the command line?

Use `sendevent -E FORCE_STARTJOB -J jobname`. This starts the job immediately, bypassing all starting conditions including date_conditions, start_times, and condition dependencies. **Before force-starting**, always check dependencies: `autorep -J jobname -q | grep condition`. If the job has dependencies, fix them first — force-starting a job that depends on missing data produces corrupt output.

Q: What happens after KILLJOB in AutoSys?

The job moves to TERMINATED status. Any downstream jobs with `condition: success(killed_job)` will not run because success was never declared. If you need downstream jobs to proceed and the work was actually done: 1. `sendevent -E KILLJOB -J jobname` 2. `sendevent -E CHANGE_STATUS -J jobname -s SUCCESS` If the work wasn't done, fix the root cause then `sendevent -E RESTART -J jobname`.

Q: What is the difference between RESTART and FORCE_STARTJOB?

Both can run a job immediately, but they signal different intent: - **RESTART**: Works only on FAILURE or TERMINATED jobs. Semantically a retry after failure. Cleaner in audit logs. - **FORCE_STARTJOB**: Works on any non-running state. Bypasses all conditions. Use for schedule overrides, not routine retries. For a failed job, prefer RESTART. For a job that never ran (INACTIVE) that you need to run outside its schedule, use FORCE_STARTJOB.

Q: Can I force start a job that is inside a BOX?

Yes, but it's usually the wrong approach. Force-starting a child job inside a non-running BOX leads to inconsistent state — the BOX may still show INACTIVE while the child runs. Better approach: Force-start the BOX itself: `sendevent -E FORCE_STARTJOB -J BOXNAME`. The child job will start naturally if its conditions are met. If you must run only the child, either move it outside the BOX temporarily or accept that the BOX state will be inconsistent.

Q: What status does a job have after term_run_time kills it?

TERMINATED (TE) — the same status as KILLJOB. The exit code will be -1 or a signal number. Check `autorep -J jobname -d` to see the exact exit code and confirm it was term_run_time that caused it. Downstream success() conditions will NOT trigger. You need to either RESTART the job (if it needs to rerun) or CHANGE_STATUS to SUCCESS (if the work was already done before timeout).

Q: Does CHANGE_STATUS automatically trigger downstream jobs?

No. CHANGE_STATUS updates the stored status in the Event Server but does NOT automatically trigger a full dependency re-evaluation. This is a common misconception. After CHANGE_STATUS, you may need to: - Send a dummy event to trigger re-evaluation (e.g., `sendevent -E JOB_STATUS_CHANGED`) - Force-start the downstream job directly - For critical paths, the upstream job may need to actually rerun Test your specific case in a dev environment before relying on CHANGE_STATUS to unblock workflows.

📅 March 19, 2026 ⏱ 3 min read 🎯 Intermediate

Where developers are forged. · Structured learning · Free forever.

📍 Part of: AutoSys → Topic 22 of 30

FORCE_STARTJOB bypasses conditions.

⚙️ Intermediate — basic DevOps knowledge assumed

In this tutorial, you'll learn

FORCE_STARTJOB bypasses conditions.

FORCE_STARTJOB bypasses ALL conditions — time AND dependencies. Fix dependencies first.
KILLJOB → TERMINATED. Downstream success() won't fire. Use CHANGE_STATUS to unblock if work was done.
RESTART is for retrying FAILED/TERMINATED jobs after transient failures. Check failure reason first.

thecodeforge.io

Manual Job Control Flow

Autosys Force Start Kill Job

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡Quick Answer

FORCE_STARTJOB: runs job immediately, bypasses ALL conditions (time + dependencies). Use when you need output now, but know what you're skipping.
KILLJOB: terminates running job → TERMINATED status. Downstream success() conditions won't fire. Use CHANGE_STATUS SUCCESS afterward to unblock.
RESTART: retry a FAILED or TERMINATED job. Cleaner than FORCE_STARTJOB for retries — audit logs show intent.
STARTJOB: respects conditions. Job starts only if its time + dependency gates are open. Practically useless in emergencies.
Production rule: FORCE_STARTJOB without understanding dependencies corrupts data. KILLJOB without CHANGE_STATUS blocks workflows for hours.

🚨 START HERE

sendevent — 60-Second Emergency Reference

Run these commands when normal scheduling fails and you need to intervene

🟡

Job won't start — need output now regardless of conditions

Immediate ActionCheck conditions first, then force-start

Commands

autorep -J JOBNAME -q | grep -E 'condition|date_conditions|start_times'

sendevent -E FORCE_STARTJOB -J JOBNAME

Fix Nowsendevent -E FORCE_STARTJOB -J JOBNAME

🟡

Job hung — needs termination

Immediate ActionKill the job, then unblock downstream

Commands

sendevent -E KILLJOB -J JOBNAME

sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS

Fix Nowsendevent -E KILLJOB -J JOBNAME && sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS

🟡

Job failed after transient error — need retry

Immediate ActionVerify failure reason, then restart

Commands

autorep -J JOBNAME -L 10 | grep FAILURE

sendevent -E RESTART -J JOBNAME

Fix Nowsendevent -E RESTART -J JOBNAME

🟡

Downstream blocked on job that won't rerun

Immediate ActionManually set upstream to SUCCESS

Commands

autorep -J UPSTREAM_JOB -q | grep status

sendevent -E CHANGE_STATUS -J UPSTREAM_JOB -s SUCCESS

Fix Nowsendevent -E CHANGE_STATUS -J UPSTREAM_JOB -s SUCCESS

Production Incident

The FORCE_STARTJOB That Corrupted the Ledger

A reporting job failed at midnight. The on-call engineer force-started it at 1 AM. The job ran, produced output, and the nightly batch continued. Three days later, the ledger was off by $2.4M. The force-start had bypassed a critical validation job.

SymptomJob completed SUCCESS. All downstream jobs ran normally. No errors in any AutoSys logs. The final output was wrong — incorrect aggregations, missing transactions. Manual reconciliation took three days to identify the root cause.

AssumptionThe engineer assumed the job only had time-based conditions (start_times). FORCE_STARTJOB would just override the schedule. They didn't check the full JIL for condition dependencies.

Root causeThe reporting job had condition: success(validate_ledger) AND success(extract_transactions). The validate_ledger job had failed earlier. The on-call engineer saw the reporting job in INACTIVE status and used FORCE_STARTJOB. FORCE_STARTJOB bypassed BOTH conditions. The job ran without validated ledger data. The output was based on incomplete extracts. No alarm fired because the job succeeded. The team didn't know about the condition dependency because they only looked at autorep -q, which shows conditions but not in an obvious way.

Fix1. Never FORCE_STARTJOB a job without running autorep -J jobname -q | grep condition first. 2. For failed dependencies, fix and restart the dependency chain, not the leaf job. 3. Add box_terminator on validation jobs so the box stops before leaf jobs can be force-started externally. 4. Create an audit script that logs all FORCE_STARTJOB events and flags any that bypassed conditions.

Key Lesson

FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.Always check autorep -q | grep condition before force-starting.Fixing a leaf job without fixing its dependencies propagates corruption.Force-start is for schedule overrides, not dependency bypasses.

Production Debug Guide

What to run when jobs won't start or won't die

Job stuck in RUNNING, need to terminate it→Use KILLJOB. Then check if downstream jobs are blocked on success(). If yes, use CHANGE_STATUS -s SUCCESS after kill. Verify with autorep -d.

Job won't start — INACTIVE or condition not met, but you need output now→First check dependencies: autorep -J JOB -q | grep condition. If dependencies exist, fix them first. Otherwise use FORCE_STARTJOB.

Job failed on transient error, you fixed the cause, need to retry→Use RESTART (not FORCE_STARTJOB). RESTART is semantically a retry and respects that the job previously failed. Check logs first: autorep -J JOB -L 5.

Downstream jobs waiting on a job that was killed or manually fixed→Use CHANGE_STATUS -s SUCCESS on the upstream job. This fires success() conditions without rerunning the job. Verify autorep shows SUCCESS before downstream runs.

Production AutoSys environments need manual intervention. Jobs hang. Downstream dependencies get stuck. A fix deploys and you need to rerun a failed job at 2 AM.

Knowing the exact sendevent command is table stakes. Knowing what happens after — that's the senior engineer difference.

FORCE_STARTJOB bypasses conditions. KILLJOB leaves downstream jobs waiting. RESTART is for retries, not first runs. This article covers the side effects that incident post-mortems reveal.

FORCE_STARTJOB — bypassing all conditions

FORCE_STARTJOB immediately starts a job regardless of its date_conditions, start_times, or condition dependencies. It's the 'run it now, no questions asked' command.

Critical nuance: FORCE_STARTJOB bypasses EVERYTHING. Not just the schedule. Not just the time gates. Also any condition: success(other_job) dependencies. The job runs even if its upstream dependencies never ran or failed.

This is the most dangerous sendevent command. Use it only when you fully understand what conditions exist on the job.

force_start.sh · BASH

1234567891011121314

# Force start a single job immediately
sendevent -E FORCE_STARTJOB -J daily_report

# Force start a BOX (starts the box, inner jobs still follow their conditions)
sendevent -E FORCE_STARTJOB -J eod_processing_box

# Force start on a specific date (run it as if it were that date)
sendevent -E FORCE_STARTJOB -J daily_report -q 20260319

# Check what conditions you're about to bypass — ALWAYS do this first
autorep -J daily_report -q | grep condition

# Check current status
autorep -J daily_report

▶ Output

/* Before force-start: check conditions */
condition: success(extract_trades) AND success(validate_positions)

/* Event sent: FORCE_STARTJOB for daily_report */
/* Job daily_report: STARTING → RUNNING (02:00:01) */
/* Both dependencies were skipped entirely */

⚠ Bypassed conditions have consequences

FORCE_STARTJOB skips ALL conditions, including dependency conditions. If daily_report depends on extract_job completing first, force-starting it means it runs without the extract data. Make sure you understand what conditions you're bypassing.

📊 Production Insight

A team force-started a billing job at 2 AM after a database timeout. The job ran, generated invoices, and sent them to customers. Three days later, the finance team noticed duplicate invoices. The billing job had a condition: success(settlement_validation). The validation job had failed at 1:55 AM. The force-start bypassed it completely.

The fix: Never force-start a job with dependencies. Fix the dependency chain instead. Use RESTART on the failed validation job, let it succeed, then the original job will start normally through conditions.

Diagnosis: autorep -J jobname -q | grep condition shows the dependencies. If non-empty, fix them first.

Rule: FORCE_STARTJOB is for schedule overrides, not dependency bypasses.

🎯 Key Takeaway

FORCE_STARTJOB bypasses ALL conditions — time AND dependencies.

Always check conditions first: autorep -J JOB -q | grep condition.

Fix failed dependencies, don't skip them. Force-start propagates corruption.

For boxes, start the box, not the child.

Should you FORCE_STARTJOB or fix dependencies first?

IfJob has condition dependencies that are FAILED or never ran

→

UseDo NOT force-start. Fix the dependencies first (RESTART or rerun), let conditions trigger naturally.

IfJob has only time-based conditions (start_times, days_of_week)

→

UseFORCE_STARTJOB is safe — you're just overriding the schedule.

IfJob has no conditions at all

→

UseFORCE_STARTJOB works, but why isn't the job running? Check date_conditions and start_times first.

IfJob is inside a BOX that's not RUNNING

→

UseFORCE_STARTJOB the BOX, not the inner job. Force-starting an inner job without the box running causes inconsistent state.

KILLJOB — terminating a running job

KILLJOB sends a termination signal to the process running on the agent machine. The job moves to TERMINATED status. Any downstream jobs waiting on success() of this job will not start.

Critical nuance: TERMINATED is NOT FAILURE. It's a separate status. A job that's killed doesn't trigger success() conditions, but it also doesn't trigger failure() conditions unless you explicitly check for TERMINATED.

After KILLJOB, the process receives SIGTERM. Well-behaved processes can catch this and clean up. Hung processes may need SIGKILL (AutoSys handles this escalation after a timeout, typically 30 seconds).

kill_job.sh · BASH

12345678910111213141516

# Kill a running job
sendevent -E KILLJOB -J hung_etl_job

# After killing, check status
autorep -J hung_etl_job

# Check if any downstream jobs are blocked
sendevent -E LIST_DEPENDENTS -J hung_etl_job

# If you need downstream jobs to proceed after kill:
# First kill the job, then manually mark it as success
sendevent -E KILLJOB -J hung_etl_job
sendevent -E CHANGE_STATUS -J hung_etl_job -s SUCCESS

# Check downstream jobs are now unblocked
autorep -J downstream_job

▶ Output

Job Name ST Exit
hung_etl_job TE -- <- TERMINATED after KILLJOB

Dependents:
downstream_job: waiting on success(hung_etl_job) → condition false

After CHANGE_STATUS SUCCESS:
downstream_job: condition met → job will start normally

🔥KILLJOB vs term_run_time

KILLJOB is a manual kill sent by an operator. term_run_time is an automatic kill triggered by AutoSys when a job exceeds its maximum runtime. Both result in TERMINATED status. The difference is who or what initiated the kill.

📊 Production Insight

A nightly ETL job hung at 3 AM. The on-call engineer killed it with KILLJOB. The job moved to TERMINATED. Five downstream jobs were waiting on success(etl_job). They never started. At 8 AM, the dashboard was empty. The team manually ran the downstream jobs, but the ETL hadn't completed — they ran on stale data.

The mistake: Killing the job doesn't complete the work. The engineer should have investigated why it hung, fixed the root cause (a database lock), then restarted the ETL job properly.

If you MUST unblock downstream without rerunning the killed job, use CHANGE_STATUS SUCCESS after the kill. But this is a bandage — the data may still be incomplete.

Diagnosis: sendevent -E LIST_DEPENDENTS -J killed_job shows what's blocked.

Rule: KILLJOB terminates the process. It doesn't complete the work. Fix the root cause, then rerun properly. CHANGE_STATUS is for emergencies only.

🎯 Key Takeaway

KILLJOB → TERMINATED status. Downstream success() won't trigger.

Use CHANGE_STATUS SUCCESS after kill to unblock dependencies (emergencies only).

LIST_DEPENDENTS shows what's blocked before you kill.

TERMINATED ≠ FAILURE ≠ SUCCESS. Know the difference.

RESTART — retrying a failed job

The RESTART event tells AutoSys to rerun a job that is in FAILURE or TERMINATED status. It's cleaner than FORCE_STARTJOB for rerunning failed jobs because it signals intent as a retry in audit logs.

Key difference from FORCE_STARTJOB: RESTART works only on FAILURE or TERMINATED jobs. FORCE_STARTJOB works on any non-running state. RESTART also respects that this is a retry — some AutoSys configurations treat retries differently for alerting purposes.

RESTART does NOT bypass conditions. The job still needs its start conditions satisfied (unless they were the reason it failed — then you have a cycle).

restart_job.sh · BASH

123456789101112131415161718

# Restart a failed job after fixing the root cause
sendevent -E RESTART -J failed_extract_job

# Before restarting, check WHY it failed
autorep -J failed_extract_job -L 10

# Check if the fix actually worked (test mode)
autorep -J failed_extract_job -q | grep command
# Manually run the command on the agent to verify

# Pattern: check for failures and restart them (for transient failures only)
autorep -J % -s FA | awk '{print $1}' | while read job; do
  echo "Restarting failed job: $job"
  sendevent -E RESTART -J "$job"
done

# For TERMINATED jobs (killed), RESTART also works
sendevent -E RESTART -J killed_job

▶ Output

Restarting failed job: extract_trades
Restarting failed job: load_positions

/* Job extract_trades: FAILURE → STARTING → RUNNING */
/* Restart treated as a new run, not a continuation */

💡Always check the failure reason before RESTART

RESTARTing without fixing the root cause just fails again. Use autorep -J JOBNAME -L 10 to see the last failure's error log. If it's a transient issue (network timeout), RESTART works. If it's a data issue (missing file), RESTART won't help.

📊 Production Insight

An engineer restarted a failed job 6 times over 2 hours. Each restart failed in 30 seconds. The root cause was a missing directory that the upstream team needed to create. The restarts were pointless and flooded the logs.

The fix: Check the error log first. If the error is transient (connection timeout, temporary lock), RESTART is appropriate. If it's permanent (missing file, permission denied, syntax error), RESTART just creates noise.

Better approach: After a failure, autorep -J JOB -L 5. grep for error keywords. If 'No such file' or 'Permission denied', do NOT restart — fix the root cause first.

Rule: RESTART is for TRANSIENT failures. For permanent failures, fix first, then RESTART once.

🎯 Key Takeaway

RESTART works on FAILURE or TERMINATED jobs only.

Check failure reason first — autorep -L 10 shows the error.

RESTART is for retries, not first runs. Use FORCE_STARTJOB for schedule overrides.

RESTART respects conditions. It's not a magic bypass.

CHANGE_STATUS — manual status override

CHANGE_STATUS is the nuclear option. It manually sets a job's status in the Event Server without running anything. This is how you unblock workflows when a job can't or shouldn't be rerun.

Most common use: A job was killed (KILLJOB) but the work was already done. No point rerunning. Set its status to SUCCESS to fire downstream success() conditions.

Risks: CHANGE_STATUS bypasses ALL validation. No agent communication. No process verification. You're telling AutoSys 'trust me, this job is in this state'. If you're wrong, downstream processes run on incorrect assumptions.

change_status.sh · BASH

123456789101112131415

# After a KILLJOB, mark it as SUCCESS to unblock downstream
sendevent -E CHANGE_STATUS -J hung_job -s SUCCESS

# Manually mark a job as FAILURE (e.g., if you see it's going to fail)
sendevent -E CHANGE_STATUS -J running_job -s FAILURE

# Mark a job as INACTIVE to prevent it from running
sendevent -E CHANGE_STATUS -J job_on_hold -s INACTIVE

# Verify the status change took effect
autorep -J hung_job -q | grep status

# Dangerous: Mark a job as SUCCESS that never ran
# Only do this if you have verified the work was done elsewhere
sendevent -E CHANGE_STATUS -J never_ran_job -s SUCCESS

▶ Output

/* Job status changed from TE (TERMINATED) to SU (SUCCESS) */
/* Downstream jobs with success(hung_job) now evaluate as true */
/* No actual process ran — you are asserting correctness */

⚠ CHANGE_STATUS is a manual override — use sparingly

Every CHANGE_STATUS SUCCESS on a job that didn't actually succeed is a potential data corruption event. Document every use. Audit change logs weekly. If you're using it more than once per month, you have a deeper reliability problem.

📊 Production Insight

A team had a job that wrote a file and then a downstream job processed it. The upstream job completed SUCCESS. The file was on disk. The file watcher downstream didn't trigger due to a run_window misconfiguration. The team needed the downstream to run now.

Instead of fixing the run_window (would require update_job, JIL change, tested in non-prod), they used CHANGE_STATUS on the upstream job. But the upstream was already SUCCESS. That did nothing.

They then tried CHANGE_STATUS on the downstream job from RUNNING to SUCCESS, but downstream was INACTIVE (waiting on file). CHANGE_STATUS doesn't trigger file detection.

Two hours of confusion later, they fixed the run_window and the workflow completed.

Lesson: CHANGE_STATUS doesn't magically trigger dependencies. It just changes the stored status. A file watcher won't trigger because you changed its status. A condition won't re-evaluate because you changed an upstream status — you need a new event (like a job completing) to trigger re-evaluation.

After CHANGE_STATUS, you may need to trigger the dependent job manually with FORCE_STARTJOB.

🎯 Key Takeaway

CHANGE_STATUS manually sets job status. No agent call. No verification.

Use after KILLJOB to unblock downstream when work is actually done.

Does NOT trigger dependency re-evaluation automatically.

Document every use. Frequent use = process problem, not tool problem.

🗂 sendevent Commands — When to Use Which

Each command has a specific job state and intent. Using the wrong one causes cascading failures.

Event	Respects conditions?	Works on status	What actually happens	Typical use
FORCE_STARTJOB	No — bypasses all	Any non-running state	Job starts immediately, no condition checks	Run job outside its schedule (fixing missed window)
STARTJOB	Yes	INACTIVE / ACTIVATED	Job starts only if time and conditions are met	Trigger after fixing a condition (rarely used)
RESTART	Yes (transient failures only)	FAILURE / TERMINATED	Reruns failed job, respects that it's a retry	Rerun after fixing a transient failure
KILLJOB	N/A	RUNNING only	SIGTERM to agent process → TERMINATED status	Terminate hung or infinite-loop job
CHANGE_STATUS + SUCCESS	N/A	Any	Manually sets status without running anything	Unblock downstream after manual verification

🎯 Key Takeaways

FORCE_STARTJOB bypasses ALL conditions — time AND dependencies. Fix dependencies first.
KILLJOB → TERMINATED. Downstream success() won't fire. Use CHANGE_STATUS to unblock if work was done.
RESTART is for retrying FAILED/TERMINATED jobs after transient failures. Check failure reason first.
CHANGE_STATUS is a manual override that doesn't trigger dependency re-evaluation automatically.
Always check conditions before force-starting: autorep -J JOB -q | grep condition.
Document every manual intervention. If you're using these commands weekly, your automation is broken.

⚠ Common Mistakes to Avoid

✕Force-starting a job without checking conditions first

Symptom

Job runs but produces corrupt output because upstream dependencies were skipped. Downstream processes continue on bad data. Error is caught days later during reconciliation.

Fix

Always run autorep -J JOB -q | grep condition before FORCE_STARTJOB. If conditions exist, fix the dependency chain instead. Use RESTART on failed dependencies, then let the original job start naturally.

✕Killing a job but not unblocking downstream dependencies

Symptom

Killed job shows TERMINATED. Downstream jobs stay INACTIVE waiting for success(). Team manually reruns downstream jobs but missing the killed job's work.

Fix

After KILLJOB, run sendevent -E LIST_DEPENDENTS -J killed_job. If downstream jobs are waiting, either: - RESTART the killed job (if work wasn't done) - CHANGE_STATUS to SUCCESS (if work was done elsewhere)

✕Using FORCE_STARTJOB when RESTART would be appropriate

Symptom

Job fails transiently. Engineer force-starts instead of restarting. Audit logs show FORCE_STARTJOB on a FAILED job, making it look like an emergency bypass when it was just a retry.

Fix

Use RESTART for retrying failed jobs. It signals intent correctly in logs and respects that this is a retry (some alerting systems treat RESTART differently from FORCE_STARTJOB).

✕Assuming CHANGE_STATUS triggers dependency re-evaluation

Symptom

Engineer changes a job from FAILURE to SUCCESS. Downstream jobs with success(upstream) still don't start. Team assumes AutoSys is broken.

Fix

CHANGE_STATUS changes the stored status but does NOT trigger a full dependency re-evaluation. After CHANGE_STATUS, either: - Send a dummy event to trigger re-evaluation (e.g., sendevent -E JOB_STATUS_CHANGED) - Force-start the downstream job directly - For critical paths, the upstream job may need to actually rerun

✕Using FORCE_STARTJOB on a child job inside a non-running BOX

Symptom

Force-started child job runs, but the BOX still shows INACTIVE or FAILURE. Downstream jobs that depend on the BOX never run because the BOX never succeeded.

Fix

Force-start the BOX, not the child. The child will start naturally if its conditions are met. If you must run only the child, the BOX state will be inconsistent — consider moving the job outside the BOX for manual runs.

Interview Questions on This Topic

QWhat is the difference between FORCE_STARTJOB and STARTJOB?JuniorReveal
FORCE_STARTJOB starts the job immediately, bypassing ALL starting conditions including date_conditions, start_times, and condition dependencies. STARTJOB respects conditions — the job will only start if its time schedule and dependency conditions are all true. In practice, STARTJOB is rarely used because if conditions are already true, the job would have started on its own. FORCE_STARTJOB is the emergency override. STARTJOB might be used after manually fixing a condition that AutoSys hasn't re-evaluated yet.
QWhat happens to a job's status after KILLJOB?JuniorReveal
KILLJOB moves the job to TERMINATED (TE) status. The job does NOT go to FAILURE or SUCCESS. The exit code is typically -1 or a signal number. Critical implications: Downstream jobs with condition: success(killed_job) will NOT trigger because success was never declared. To unblock downstream jobs, you must either: - RESTART the killed job (if it needs to actually run) - CHANGE_STATUS -s SUCCESS (if the work was done elsewhere) KILLJOB sends SIGTERM to the agent process. If the process doesn't respond within ~30 seconds, AutoSys escalates to SIGKILL.
QWhat is the difference between RESTART and FORCE_STARTJOB for a failed job?Mid-levelReveal
Both can rerun a failed job, but they signal different intent: - RESTART works only on FAILURE or TERMINATED jobs. It's semantically a retry — audit logs show 'job restarted after failure'. - FORCE_STARTJOB works on any non-running state. It's semantically an emergency override — audit logs show 'job force-started', which looks like bypassing conditions. In practice, use RESTART for rerunning failed jobs after fixing the root cause. Use FORCE_STARTJOB only when you need to bypass conditions (like running a job outside its schedule). RESTART also respects that this is a retry — some AutoSys profiles treat retry alarms differently (e.g., don't page on-call for a retry that succeeds).
QHow would you unblock dependent jobs after a job was killed?Mid-levelReveal
Two approaches depending on whether the work was completed: If the killed job's work IS completed (e.g., job hung after writing output): 1. Verify the work is actually done (check output files, database records) 2. Run sendevent -E CHANGE_STATUS -J killed_job -s SUCCESS 3. Downstream jobs with success(killed_job) will now evaluate as true 4. If they don't start automatically, send a dummy event or FORCE_STARTJOB them If the killed job's work is NOT completed: 1. Fix the root cause (why it hung) 2. Run sendevent -E RESTART -J killed_job 3. Let it complete successfully, then downstream jobs will trigger naturally Never just kill and walk away — downstream jobs will be blocked indefinitely.
QIf term_run_time terminates a job, what status does it move to?JuniorReveal
TERMINATED (TE) — the same status as KILLJOB. AutoSys sends SIGTERM when term_run_time expires, then escalates to SIGKILL if the process doesn't respond. The key difference is who initiated the termination: - term_run_time: automatic, based on job attribute - KILLJOB: manual, from operator command Both result in TE status. Downstream success() conditions will NOT trigger in either case. You still need RESTART or CHANGE_STATUS to unblock workflows.
QWhat are the risks of using CHANGE_STATUS to mark a job as SUCCESS when it didn't actually run?SeniorReveal
Risks include: 1. Data corruption: Downstream jobs assume the upstream work is done. If it's not, they process incomplete or stale data. 2. Missing audit trail: The job never ran, so no logs, no output files, no metrics. Hard to debug later. 3. State inconsistency: The Event Server says SUCCESS, but the real world says otherwise. Any external monitoring that checks actual outputs will see a mismatch. 4. Cascading assumptions: A downstream job might read a file that the 'successful' job was supposed to write. File doesn't exist → downstream fails anyway. Safe use: Only after MANUAL verification that the work was done by another means (e.g., a human ran the script manually, or the output was verified in another system). Document every CHANGE_STATUS with a comment in your ticketing system. If you're using CHANGE_STATUS more than once a month, your job design or monitoring has a problem.

Frequently Asked Questions

How do I force start an AutoSys job from the command line?

Use sendevent -E FORCE_STARTJOB -J jobname. This starts the job immediately, bypassing all starting conditions including date_conditions, start_times, and condition dependencies.

Before force-starting, always check dependencies: autorep -J jobname -q | grep condition. If the job has dependencies, fix them first — force-starting a job that depends on missing data produces corrupt output.

What happens after KILLJOB in AutoSys?

The job moves to TERMINATED status. Any downstream jobs with condition: success(killed_job) will not run because success was never declared.

If you need downstream jobs to proceed and the work was actually done: 1. sendevent -E KILLJOB -J jobname 2. sendevent -E CHANGE_STATUS -J jobname -s SUCCESS

If the work wasn't done, fix the root cause then sendevent -E RESTART -J jobname.

What is the difference between RESTART and FORCE_STARTJOB?

Both can run a job immediately, but they signal different intent: - RESTART: Works only on FAILURE or TERMINATED jobs. Semantically a retry after failure. Cleaner in audit logs. - FORCE_STARTJOB: Works on any non-running state. Bypasses all conditions. Use for schedule overrides, not routine retries.

For a failed job, prefer RESTART. For a job that never ran (INACTIVE) that you need to run outside its schedule, use FORCE_STARTJOB.

Can I force start a job that is inside a BOX?

Yes, but it's usually the wrong approach. Force-starting a child job inside a non-running BOX leads to inconsistent state — the BOX may still show INACTIVE while the child runs.

Better approach: Force-start the BOX itself: sendevent -E FORCE_STARTJOB -J BOXNAME. The child job will start naturally if its conditions are met.

If you must run only the child, either move it outside the BOX temporarily or accept that the BOX state will be inconsistent.

What status does a job have after term_run_time kills it?

TERMINATED (TE) — the same status as KILLJOB. The exit code will be -1 or a signal number. Check autorep -J jobname -d to see the exact exit code and confirm it was term_run_time that caused it.

Downstream success() conditions will NOT trigger. You need to either RESTART the job (if it needs to rerun) or CHANGE_STATUS to SUCCESS (if the work was already done before timeout).

Does CHANGE_STATUS automatically trigger downstream jobs?

No. CHANGE_STATUS updates the stored status in the Event Server but does NOT automatically trigger a full dependency re-evaluation. This is a common misconception.

After CHANGE_STATUS, you may need to

Send a dummy event to trigger re-evaluation (e.g., sendevent -E JOB_STATUS_CHANGED)
Force-start the downstream job directly
For critical paths, the upstream job may need to actually rerun

Test your specific case in a dev environment before relying on CHANGE_STATUS to unblock workflows.

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged