Skip to content
Home DevOps AutoSys Status Codes: 12 Statuses You Must Know

AutoSys Status Codes: 12 Statuses You Must Know

Where developers are forged. · Structured learning · Free forever.
📍 Part of: AutoSys → Topic 18 of 30
PEND_MACH = full disk.
🧑‍💻 Beginner-friendly — no prior DevOps experience needed
In this tutorial, you'll learn
PEND_MACH = full disk.
  • PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
  • ON_HOLD + OFF_HOLD = immediate start. ON_ICE + OFF_ICE = next cycle only. Master this difference.
  • ACTIVATED is normal — job waiting inside running box. Don't force-start it.
AutoSys Job Status Lifecycle — All Status Codes State machine diagram showing all AutoSys job statuses: INACTIVE, ACTIVATED, STARTING, RUNNING, SUCCESS, FAILURE, ON_HOLD, ON_ICE, PEND_MACH, TERMINATED with transition arrows. THECODEFORGE.IOJob Status LifecycleEvery AutoSys status code and transition INACTIVE ACTIVATED STARTING RUNNING SUCCESS FAILURE ON_HOLD ON_ICE PEND_MACH TERMINATEDTHECODEFORGE.IO
thecodeforge.io
AutoSys Job Status Lifecycle — All Status Codes
Autosys Job Status Codes
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • SUCCESS (SU): Job completed with exit code 0. Normal. No action.
  • FAILURE (FA): Job exited non-zero. Check std_err_file before restarting.
  • RUNNING (RU): Job executing. Monitor runtime vs expected duration.
  • PEND_MACH (PE): Agent unreachable. First check: disk space on agent (df -h).
  • ON_HOLD (OH): Manually held. OFF_HOLD starts immediately if conditions met.
  • ON_ICE (OI): Suspended. OFF_ICE waits for next scheduling cycle.
  • ACTIVATED (AC): Job inside RUNNING box, waiting its turn. Normal.
  • TERMINATED (TE): Job killed (term_run_time or KILLJOB). Investigate why.
  • Production rule: 20+ jobs in PEND_MACH on same machine = disk full. Clear space, restart agent.
🚨 START HERE

Status Codes — 60-Second Diagnosis

Run these commands when you see problem statuses
🟡

FAILURE — find why

Immediate ActionCheck error file and last runs
Commands
autorep -J JOBNAME -q | grep std_err_file
autorep -J JOBNAME -L 10 | grep -A 5 FAILURE
Fix Nowcat /path/to/error/file
🟡

PEND_MACH — check disk first

Immediate ActionSSH to agent, check disk space
Commands
ssh agent-host 'df -h'
autoping -m agent-host
Fix NowClean disk space, restart agent
🟡

Check if job is ON_HOLD or ON_ICE

Immediate ActionView job status and release
Commands
autorep -J JOBNAME | grep -E 'OH|OI'
sendevent -E OFF_HOLD -J JOBNAME (or OFF_ICE)
Fix NowVerify if hold was intentional before releasing
🟡

Job stuck at ACTIVATED too long

Immediate ActionCheck conditions and parent box
Commands
autorep -J JOBNAME -q | grep condition
autorep -J PARENT_BOX -d | grep status
Fix NowCheck why conditions aren't met
Production Incident

The 2 AM PEND_MACH That Wasn't a Network Problem

Fifty jobs went to PEND_MACH at 2 AM. The on-call engineer spent 90 minutes checking network routes, firewall rules, and agent configurations. The cause was a 100% full /var partition. A df -h would have shown it in 10 seconds.
Symptomautorep shows 47 jobs all with PE status on the same agent machine. autoping -m agent-host returns MISSING. No recent changes to network or firewall.
AssumptionThe engineer assumed network or firewall because 'agent unreachable' sounds like connectivity. They didn't check the most common cause first.
Root causeThe Remote Agent service writes logs to /var/log/autosys. When /var fills 100%, the agent process stops. The agent doesn't crash with an error — it just exits. AutoSys's Event Processor keeps trying to contact it, sees no response, and leaves jobs in PEND_MACH. The agent machine had a 20GB /var partition. Log rotation failed 3 days ago. The partition filled completely.
Fix1. SSH to agent machine: ssh agent-host 2. df -h → /var 100% used 3. Find large files: du -sh /var/log/* | sort -h 4. Rotate or delete old logs: sudo logrotate -f /etc/logrotate.conf 5. Restart agent: /opt/CA/WorkloadAutomationAE/agent/bin/agent_start 6. Verify: autoping -m agent-host → ACTIVE 7. Jobs will auto-resume from PEND_MACH to RUNNING Prevention: Set up disk space monitoring at 80% for agent machines. Auto-rotate logs weekly.
Key Lesson
PEND_MACH on multiple jobs → same machine → disk space is #1 cause.Diagnosis order: disk space → agent process → network → everything else.df -h is the first command, not the last. It saves hours.Agent machines need disk monitoring. Full disk stops the agent silently.
Production Debug Guide

FA, PE, TE, OH/OI — the statuses that need your attention

FAILURE (FA) — job exited with non-zero code1. Check std_err_file in job definition. 2. autorep -J JOB -L 10 (last runs). 3. Fix root cause (code, missing file, permission). 4. FORCE_STARTJOB after fix.
PEND_MACH (PE) — agent unreachable1. ssh agent-host, df -h (full disk = #1). 2. ps -ef | grep autosys (agent running?). 3. autoping -m AGENT (from server). 4. Clear disk space, restart agent.
TERMINATED (TE) — job was killed1. autorep -J JOB -d | grep term_run_time (expired?). 2. Check if KILLJOB was sent manually. 3. Increase term_run_time or fix hang reason. 4. RESTART job.
ON_HOLD (OH) or ON_ICE (OI) — job intentionally held1. Verify hold is intentional (sendevent history). 2. OFF_HOLD: sendevent -E OFF_HOLD -J JOB (runs immediately if conditions met). 3. OFF_ICE: sendevent -E OFF_ICE -J JOB (waits for next schedule).

You'll stare at autorep output every day. SU, FA, RU, OH, OI, PE, TE, AC, IN, ST, QU, RE.

That's 12 status codes. Some are normal. Some need immediate action. Some are traps that operations teams misinterpret for hours.

This article covers each status — what it means, what causes it, and exactly what to do. The difference between ON_HOLD and ON_ICE appears in every AutoSys interview. Master it here.

Normal lifecycle statuses

These are the statuses a job moves through during a healthy execution. You should see them regularly and not be alarmed.

INACTIVE (IN): Job hasn't been triggered yet. Waiting for its schedule or conditions. This is the starting state for most scheduled jobs.

ACTIVATED (AC): The job's parent BOX is in RUNNING state, but the job itself hasn't started yet because its own conditions aren't met. This is normal — it means the box is active and the job is queued.

STARTING (ST): Brief transition state. The Event Processor sent the start command to the agent, and the agent is spinning up the process. Usually lasts under 3 seconds.

RUNNING (RU): Job is executing on the agent machine. Monitor runtime against expected duration. A job that runs 4x longer than usual may be hung.

SUCCESS (SU): Job completed with exit code 0. Normal completion. No action needed.

RESTART (RE): Job is being automatically retried after a failure (n_retrys configured). This is normal for jobs with retry logic. Only be concerned if the final attempt also fails.

check_status.sh · BASH
12345678910111213141516
# Check status of a single job
autorep -J daily_report

# Check status with full details (start/end times, exit code)
autorep -J daily_report -d

# List all jobs and their current status
autorep -J %

# Filter by status (SU = SUCCESS, FA = FAILURE, RU = RUNNING)
autorep -J % -s SU
autorep -J % -s FA
autorep -J % -s RU

# Check if a job is hung (running longer than expected)
autorep -J long_job -d | grep "Run Time"
▶ Output
Job Name ST Exit Start Time End Time
------------------------------------------------------------
daily_report SU 0 03/19/2026 02:00:01 03/19/2026 02:08:43
weekly_reconcile FA 1 03/19/2026 03:00:00 03/19/2026 03:01:22
monthly_summary OH -- -- --
Mental Model
The Package Tracking Model
PEND_MACH is 'truck broke down'. ON_HOLD is 'hold at depot'. The analogy makes the statuses intuitive.
  • INACTIVE → ACTIVATED → STARTING → RUNNING → SUCCESS (normal flow)
  • FAILURE, TERMINATED, PEND_MACH = delivery exceptions
  • ON_HOLD/ON_ICE = manual holds (customer requested hold)
  • The Event Processor is the dispatch center. Agents are the delivery trucks.
📊 Production Insight
A team saw a job in ACTIVATED status for 3 hours and assumed it was stuck. They force-started it. The job ran and failed because its dependency hadn't completed yet. The original job would have started 10 minutes later when the dependency finished.
ACTIVATED is not stuck. It means 'waiting for conditions inside a running box'. The job was fine. The team's impatience caused a failure.
Rule: ACTIVATED only becomes a problem if the parent box has been RUNNING for longer than the expected total duration of all jobs in the box. Otherwise, it's normal.
🎯 Key Takeaway
IN → AC → ST → RU → SU is normal job lifecycle.
ACTIVATED means 'waiting inside running box', not stuck.
RUNNING for 2x expected = possibly hung.
SUCCESS doesn't guarantee correct output — validate separately.
Is this status normal or a problem?
IfJob in ACTIVATED for < 1 hour, box still RUNNING
UseNormal. Job is waiting for its conditions inside the box.
IfJob in RUNNING for > 2x expected duration
UseProbably hung. Investigate. May need KILLJOB.
IfJob in PEND_MACH, other jobs on same machine also PEND_MACH
UseProblem. Agent machine likely down or disk full. Check immediately.
IfJob in FAILURE, first time in weeks
UseProblem. Investigate root cause before restarting.

All status codes with abbreviations

The autorep command uses two-letter abbreviations for status codes. Here they all are, including the less common ones:

Normal/Interim: AC (ACTIVATED), IN (INACTIVE), ST (STARTING), RU (RUNNING), SU (SUCCESS), RE (RESTART)

Problem/Unusual: FA (FAILURE), PE (PEND_MACH), TE (TERMINATED), QU (QUE_WAIT)

Manual holds: OH (ON_HOLD), OI (ON_ICE)

Less common: DE (DELAYED — job waiting for start time), OP (OPERATOR — operator held), SP (STOPPED), UP (UP STREAM — waiting on upstream), WD (WAIT_DEP — waiting on file watcher)

The difference between ON_HOLD and ON_ICE is the most misunderstood. ON_HOLD + OFF_HOLD = start immediately if conditions met. ON_ICE + OFF_ICE = wait for next scheduling cycle.

status_abbreviations.txt · BASH
12345678910111213141516
AC  ACTIVATED    -- Box is RUNNING but this job hasn't started yet
DE  DELAYED      -- Job waiting to meet its start time criteria
FA  FAILURE      -- Job completed with non-zero exit code
IN  INACTIVE     -- Job hasn't been triggered; waiting for schedule
OH  ON_HOLD      -- Manually held; won't run until released
OI  ON_ICE       -- Suspended; won't run even when conditions reappear
OP  OPERATOR     -- Operator held (sendevent -E HOLDJOB)
PE  PEND_MACH    -- Waiting for target machine (agent offline)
QU  QUE_WAIT     -- Waiting for available machine load/resources
RE  RESTART      -- Job is being restarted after failure
RU  RUNNING      -- Job is currently executing
ST  STARTING     -- Job sent to agent; waiting for execution to begin
SU  SUCCESS      -- Job completed with exit code 0
TE  TERMINATED   -- Job was killed (term_run_time, KILLJOB event, etc.)
UP  UP_STREAM    -- Waiting on upstream dependencies
WD  WAIT_DEP     -- Waiting on file watcher condition
⚠ ON_HOLD vs ON_ICE — the most misunderstood difference
ON_HOLD: OFF_HOLD starts job immediately if conditions are currently satisfied. ON_ICE: OFF_ICE does NOT start immediately — it waits for conditions to reoccur in the next scheduling cycle. This difference causes major operational confusion.
📊 Production Insight
A financial firm put a midnight reporting job ON_ICE at 2 PM to prevent it from running during database maintenance. After maintenance finished at 4 PM, they did OFF_ICE expecting it to run immediately. It didn't. The job ran at midnight as originally scheduled. The report was 8 hours late.
The team confused ON_ICE with ON_HOLD. They should have used ON_HOLD for a temporary pause during business hours.
Rule: Use ON_HOLD when you want the job to run as soon as you're ready. Use ON_ICE when you want the job to return to its normal schedule without running out-of-cycle.
🎯 Key Takeaway
ON_HOLD + OFF_HOLD = immediate start (if conditions met).
ON_ICE + OFF_ICE = next scheduled cycle only.
Use ON_HOLD for temporary pauses (maintenance, migrations).
Use ON_ICE for permanent schedule changes or to prevent out-of-cycle runs.

The problem statuses and what to do

These statuses almost always require your attention. Each has a specific diagnosis path.

problem_statuses.sh · BASH
123456789101112131415161718192021222324252627282930
# ── FAILURE: job failed ──────────────────────────────────────────
# Check the error log first
cat /logs/autosys/daily_report.err
# Then check the event log
autorep -J daily_report -d
# Review recent runs
autorep -J daily_report -run 7
# Retry after fixing the underlying issue
sendevent -E FORCE_STARTJOB -J daily_report

# ── PEND_MACH: agent machine offline ─────────────────────────────
# First: check disk space on agent (MOST COMMON)
ssh prod-server-01 'df -h'
# Then check agent status
autoping -m prod-server-01
autorep -M prod-server-01
# Restart agent after fixing issue
/opt/CA/agent/bin/agent_start

# ── TERMINATED: job was killed ────────────────────────────────────
# Check if killed by term_run_time or manually
autorep -J slow_job -d | grep -E 'term_run_time|KILLJOB'
# If due to term_run_time, increase it:
update_job: slow_job
term_run_time: 180

# ── QUE_WAIT: machine overloaded ──────────────────────────────────
# Check machine load limits
autorep -M machine-name
autorep -J jobname -q | grep load
▶ Output
FAILURE — Error in /logs/autosys/daily_report.err:
ERROR: Database connection timeout after 30s

PEND_MACH — prod-server-01: MISSING (agent service down)
Disk usage on prod-server-01: /apps 100% full — agent stopped

TERMINATED — term_run_time: 60 (exceeded). Job ran for 62 minutes.
🔥QUE_WAIT vs PEND_MACH — subtle but important difference
PEND_MACH = agent unreachable (machine down or agent stopped). QUE_WAIT = agent reachable but machine too busy (load exceeds max_load). QUE_WAIT resolves automatically when load drops. PEND_MACH requires intervention.
📊 Production Insight
A team saw a job in QUE_WAIT and assumed the agent was down. They restarted the agent, which cleared the queue, but the underlying high load was caused by a different job consuming all CPU. The high-load job continued to run, and QUE_WAIT returned immediately.
QUE_WAIT doesn't mean 'agent broken'. It means 'machine is too busy right now'. AutoSys is protecting the machine from overload.
Diagnosis: autorep -M machine-name shows the max_load setting and current load. If load > max_load, jobs wait.
Fix: Increase max_load on the machine definition, or reduce load from other jobs. Don't restart the agent — that doesn't solve the load issue.
Rule: QUE_WAIT = load problem. PEND_MACH = agent/network problem. Don't confuse them.
🎯 Key Takeaway
FAILURE: Check error log first. Never restart without investigating.
PEND_MACH: First check disk space (df -h). 90% of cases.
TERMINATED: term_run_time or KILLJOB. Increase runtime or fix hang.
QUE_WAIT: Machine overload. Increase max_load or reduce load.
🗂 AutoSys Status Codes — Complete Reference
12+ statuses, their abbreviations, normalcy, and actions
StatusAbbrev.Normal or problem?Typical causeAction needed?
INACTIVEINNormalWaiting for schedule/conditionsNo
ACTIVATEDACNormalBox is RUNNING, job waiting its turnNo — but monitor if box runs too long
STARTINGSTNormalAgent spinning up processNo — brief transition
RUNNINGRUNormalJob executingNo — but monitor duration
SUCCESSSUNormalExit code 0No
FAILUREFAProblemNon-zero exit codeYes — investigate and fix
ON_HOLDOHNormal (if intentional)Manual hold (sendevent)Check if intentional
ON_ICEOINormal (if intentional)Manual freeze (sendevent)Check if intentional
PEND_MACHPEProblemAgent unreachable (full disk #1)Yes — check agent machine
QUE_WAITQUPossible problemMachine load > max_loadMonitor — resolves automatically
TERMINATEDTEProblem (usually)Killed by term_run_time or manualInvestigate cause
RESTARTRENormalAuto-retry after failureMonitor for continued failure

🎯 Key Takeaways

  • PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
  • ON_HOLD + OFF_HOLD = immediate start. ON_ICE + OFF_ICE = next cycle only. Master this difference.
  • ACTIVATED is normal — job waiting inside running box. Don't force-start it.
  • FAILURE requires investigation before restart. Check error log first, always.
  • QUE_WAIT = machine overload. PEND_MACH = agent unreachable. Don't confuse them.
  • TERMINATED = job was killed. Check if term_run_time or manual KILLJOB.
  • autorep -d shows detail (start/end times). autorep -q shows JIL definition. Use both.

⚠ Common Mistakes to Avoid

    Confusing ON_HOLD and ON_ICE
    Symptom

    A job is put ON_ICE during maintenance. After maintenance, OFF_ICE is issued. The job doesn't start. Team assumes AutoSys is broken. The job runs at its scheduled time hours later.

    Fix

    ON_HOLD: release starts immediately if conditions met. ON_ICE: release waits for next scheduling cycle. Use ON_HOLD for temporary pauses during business hours.

    Not checking PEND_MACH machine disk space first
    Symptom

    Engineer spends 2 hours checking network routes, firewalls, agent config. The cause is a 100% full /var partition. df -h would have shown it in 10 seconds.

    Fix

    For PEND_MACH, always run df -h on the agent machine first. Full disk is the cause in 90% of cases. Only then check network and agent process.

    Restarting jobs in FAILURE without first reading the error log
    Symptom

    Job fails. Engineer force-starts it immediately. It fails again. Repeat 5 times. The error log shows 'database connection timeout'. The database is down. Restarting won't help.

    Fix

    Before any restart, check std_err_file or autorep -L. Understand why it failed. Fix the root cause, then restart once.

    Not distinguishing INACTIVE from ON_ICE
    Symptom

    Team sees a job in INACTIVE and thinks it's frozen. They force-start it, but it was just waiting for its schedule. It runs early and processes incomplete data.

    Fix

    INACTIVE is normal — the job hasn't met its start conditions yet. It's not an error. Check start_times and conditions before intervening.

    Treating ACTIVATED as a stuck state
    Symptom

    Job in ACTIVATED for 2 hours. Team force-starts it. The job's dependency wasn't ready. The job fails. The original job would have started correctly 10 minutes later.

    Fix

    ACTIVATED means 'waiting inside a running box'. It's not stuck unless the parent box has been running longer than the total expected duration of all jobs in the box.

Interview Questions on This Topic

  • QWhat does PEND_MACH status mean in AutoSys?Mid-levelReveal
    PEND_MACH (PE) means the job is waiting for its target machine to become available. The Remote Agent on that machine is not responding to the Event Processor. Most common cause (90% of cases): The agent machine's filesystem is 100% full, causing the agent service to stop. Other causes: - Agent service not running - Machine offline - Network issue (firewall blocking port 7520) - Agent machine overloaded (different from QUE_WAIT) Diagnosis order: 1. SSH to agent machine: df -h (check disk space) 2. Check agent process: ps -ef | grep autosys 3. Test connectivity: autoping -m agent-host Fix: Clear disk space, restart agent, or fix network issue.
  • QWhat is the difference between ON_HOLD and ON_ICE in AutoSys?Mid-levelReveal
    This is the most common AutoSys interview question. ON_HOLD (OH): Manual hold. When released (OFF_HOLD), the job starts IMMEDIATELY if its starting conditions are currently satisfied. ON_ICE (OI): Manual freeze. When released (OFF_ICE), the job does NOT start immediately. It waits for conditions to be re-evaluated in the next scheduling cycle. Example: A job that runs at midnight is put ON_ICE at 2 PM. At 3 PM, it's released. It runs at midnight, not at 3 PM. When to use: - ON_HOLD: Temporary pauses during business hours (database migration, system maintenance) where you want immediate resume - ON_ICE: Permanent schedule changes or when you don't want out-of-cycle runs Key difference: OFF_HOLD triggers immediate evaluation. OFF_ICE does not.
  • QWhat does ACTIVATED status mean for a job inside a BOX?JuniorReveal
    ACTIVATED (AC) means the job's parent BOX is in RUNNING state, but the job itself hasn't started yet because its own individual conditions (start_times, condition dependencies) haven't been met. ACTIVATED is completely normal. It is NOT an error or a stuck state. Example: A box starts at 6 PM. Inside the box, Job A has condition: success(extract_job). If extract_job hasn't finished yet, Job A shows ACTIVATED. When extract_job finishes, Job A will automatically transition to STARTING and then RUNNING. When to investigate: Only if the parent box has been RUNNING for longer than the total expected duration of all jobs in the box. Otherwise, be patient.
  • QHow do you force-start a job that is in FAILURE status?Mid-levelReveal
    Step-by-step process: Step 1 — Investigate (do NOT force-start immediately): - autorep -J jobname -q | grep std_err_file — locate error log - cat /path/to/error/file — understand the failure - autorep -J jobname -L 10 — see recent runs Step 2 — Fix root cause: - Code bug → fix and redeploy - Missing file → request from upstream - Permission error → engage sysadmin - Transient network issue → may not need fix Step 3 — Restart: - sendevent -E FORCE_STARTJOB -J jobname - This bypasses all conditions (time and dependencies) - Job runs immediately Alternative for retries: If the failure was transient and you just need to retry, sendevent -E RESTART -J jobname is cleaner (signals intent as a retry). Never: Force-start a failed job without understanding why it failed. You'll likely just see it fail again.
  • QWhat is the most common cause of jobs going to PEND_MACH in bulk?SeniorReveal
    A full disk on the agent machine. When 20+ jobs on the same machine all show PEND_MACH simultaneously, the agent machine's filesystem is almost certainly 100% full. The Remote Agent service stops when it can't write logs, and AutoSys marks all jobs targeting that machine as PEND_MACH. Diagnosis: ``bash ssh agent-host 'df -h' ` Look for any partition at 100% usage, especially /var, /tmp, or the agent install directory. Fix: 1. Clear space (rotate logs, delete temp files) 2. Restart agent: /opt/CA/agent/bin/agent_start` 3. Jobs will automatically resume from PEND_MACH Prevention: Set up disk space monitoring at 80% for all agent machines. Configure automatic log rotation. Other less common causes: agent process crashed, machine offline, network partition. But always check disk space first — it's the cause 90% of the time.
  • QWhat does QUE_WAIT status mean and how is it different from PEND_MACH?SeniorReveal
    QUE_WAIT (QU) means the target machine is reachable but currently too busy to accept new jobs. AutoSys is protecting the machine from overload. Difference from PEND_MACH: - PEND_MACH: Agent unreachable (machine down, agent stopped, network issue). Requires intervention. - QUE_WAIT: Agent reachable, but machine load > max_load. Resolves automatically when load drops. Cause: The machine's max_load attribute (default 1.0) is exceeded by the current system load average. Check: ``bash autorep -M machine-name # shows max_load and current load ` Resolution: - Do nothing — jobs will start when load drops below max_load - Or increase max_load: update_machine: machine-name, max_load: 2.0` - Or reduce load from other jobs Common mistake: Restarting the agent when seeing QUE_WAIT. This doesn't help — the load is the issue, not the agent.

Frequently Asked Questions

What does PEND_MACH mean in AutoSys?

PEND_MACH (PE) means the job is waiting for its target machine to become available. The Remote Agent on the target machine is not responding — usually because the agent service has stopped (commonly due to a full disk), the machine is offline, or there's a network issue.

First diagnostic step: SSH to the agent machine and run df -h. Full disk is the cause in 90% of cases. Clear space and restart the agent.

What does ACTIVATED mean in AutoSys?

ACTIVATED (AC) means the job's parent BOX is in RUNNING state, but the job itself hasn't started yet because its own conditions (start_times, condition attribute) haven't been met. It's a normal state — the job is queued inside a running box. Do NOT force-start a job in ACTIVATED unless the parent box has been running far longer than expected.

What is the difference between ON_HOLD and ON_ICE?

ON_HOLD: When you release a job from ON_HOLD (OFF_HOLD), it will run if its starting conditions are currently satisfied. ON_ICE: When you take a job off ICE (OFF_ICE), it will NOT run immediately — it waits until its conditions reoccur in the next scheduling cycle. ON_ICE is a stronger suspension.

Example: A midnight job put ON_ICE at 2 PM and released at 3 PM runs at midnight, not 3 PM. The same job put ON_HOLD at 2 PM and released at 3 PM would run immediately at 3 PM (if conditions are met).

How do I restart a failed AutoSys job?
First, investigate why it failed
  • autorep -J jobname -q | grep std_err_file to find error log
  • cat /path/to/error/file to see the error
  • autorep -J jobname -L 5 to see recent runs

Fix the underlying issue (code bug, missing file, permission). Then: - sendevent -E FORCE_STARTJOB -J jobname

Do NOT restart without investigating — you'll likely just see it fail again immediately.

What does TERMINATED status mean in AutoSys?
TERMINATED (TE) means the job was killed. Common causes
  • term_run_time exceeded (AutoSys killed it after max runtime)
  • Manual KILLJOB event sent by operator
  • Agent machine went down while job was running

Check autorep -J jobname -d | grep -E 'term_run_time|KILLJOB' to see the cause. If term_run_time was too low, increase it. If manual kill, understand why the job needed killing. After fixing, use RESTART or FORCE_STARTJOB to rerun.

What does QUE_WAIT mean and how is it different from PEND_MACH?

QUE_WAIT (QU) means the machine is reachable but currently too busy to accept new jobs — the system load exceeds the machine's max_load setting. AutoSys queues the job until load drops.

Difference from PEND_MACH
  • PEND_MACH: Agent unreachable (machine down, agent crashed, network issue). Requires intervention.
  • QUE_WAIT: Agent reachable, machine overloaded. Resolves automatically.

Diagnosis: autorep -M machine-name shows max_load and current load. If load > max_load, QUE_WAIT is expected.

Fix: Wait for load to drop, or increase max_load on the machine definition.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousAutoSys date_conditions and run_windowNext →AutoSys sendevent Command
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged