AutoSys Status Codes: 12 Statuses You Must Know
- PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
- ON_HOLD + OFF_HOLD = immediate start. ON_ICE + OFF_ICE = next cycle only. Master this difference.
- ACTIVATED is normal — job waiting inside running box. Don't force-start it.
- SUCCESS (SU): Job completed with exit code 0. Normal. No action.
- FAILURE (FA): Job exited non-zero. Check std_err_file before restarting.
- RUNNING (RU): Job executing. Monitor runtime vs expected duration.
- PEND_MACH (PE): Agent unreachable. First check: disk space on agent (df -h).
- ON_HOLD (OH): Manually held. OFF_HOLD starts immediately if conditions met.
- ON_ICE (OI): Suspended. OFF_ICE waits for next scheduling cycle.
- ACTIVATED (AC): Job inside RUNNING box, waiting its turn. Normal.
- TERMINATED (TE): Job killed (term_run_time or KILLJOB). Investigate why.
- Production rule: 20+ jobs in PEND_MACH on same machine = disk full. Clear space, restart agent.
Status Codes — 60-Second Diagnosis
FAILURE — find why
autorep -J JOBNAME -q | grep std_err_fileautorep -J JOBNAME -L 10 | grep -A 5 FAILUREPEND_MACH — check disk first
ssh agent-host 'df -h'autoping -m agent-hostCheck if job is ON_HOLD or ON_ICE
autorep -J JOBNAME | grep -E 'OH|OI'sendevent -E OFF_HOLD -J JOBNAME (or OFF_ICE)Job stuck at ACTIVATED too long
autorep -J JOBNAME -q | grep conditionautorep -J PARENT_BOX -d | grep statusProduction Incident
Production Debug GuideFA, PE, TE, OH/OI — the statuses that need your attention
You'll stare at autorep output every day. SU, FA, RU, OH, OI, PE, TE, AC, IN, ST, QU, RE.
That's 12 status codes. Some are normal. Some need immediate action. Some are traps that operations teams misinterpret for hours.
This article covers each status — what it means, what causes it, and exactly what to do. The difference between ON_HOLD and ON_ICE appears in every AutoSys interview. Master it here.
Normal lifecycle statuses
These are the statuses a job moves through during a healthy execution. You should see them regularly and not be alarmed.
INACTIVE (IN): Job hasn't been triggered yet. Waiting for its schedule or conditions. This is the starting state for most scheduled jobs.
ACTIVATED (AC): The job's parent BOX is in RUNNING state, but the job itself hasn't started yet because its own conditions aren't met. This is normal — it means the box is active and the job is queued.
STARTING (ST): Brief transition state. The Event Processor sent the start command to the agent, and the agent is spinning up the process. Usually lasts under 3 seconds.
RUNNING (RU): Job is executing on the agent machine. Monitor runtime against expected duration. A job that runs 4x longer than usual may be hung.
SUCCESS (SU): Job completed with exit code 0. Normal completion. No action needed.
RESTART (RE): Job is being automatically retried after a failure (n_retrys configured). This is normal for jobs with retry logic. Only be concerned if the final attempt also fails.
# Check status of a single job autorep -J daily_report # Check status with full details (start/end times, exit code) autorep -J daily_report -d # List all jobs and their current status autorep -J % # Filter by status (SU = SUCCESS, FA = FAILURE, RU = RUNNING) autorep -J % -s SU autorep -J % -s FA autorep -J % -s RU # Check if a job is hung (running longer than expected) autorep -J long_job -d | grep "Run Time"
------------------------------------------------------------
daily_report SU 0 03/19/2026 02:00:01 03/19/2026 02:08:43
weekly_reconcile FA 1 03/19/2026 03:00:00 03/19/2026 03:01:22
monthly_summary OH -- -- --
- INACTIVE → ACTIVATED → STARTING → RUNNING → SUCCESS (normal flow)
- FAILURE, TERMINATED, PEND_MACH = delivery exceptions
- ON_HOLD/ON_ICE = manual holds (customer requested hold)
- The Event Processor is the dispatch center. Agents are the delivery trucks.
All status codes with abbreviations
The autorep command uses two-letter abbreviations for status codes. Here they all are, including the less common ones:
Normal/Interim: AC (ACTIVATED), IN (INACTIVE), ST (STARTING), RU (RUNNING), SU (SUCCESS), RE (RESTART)
Problem/Unusual: FA (FAILURE), PE (PEND_MACH), TE (TERMINATED), QU (QUE_WAIT)
Manual holds: OH (ON_HOLD), OI (ON_ICE)
Less common: DE (DELAYED — job waiting for start time), OP (OPERATOR — operator held), SP (STOPPED), UP (UP STREAM — waiting on upstream), WD (WAIT_DEP — waiting on file watcher)
The difference between ON_HOLD and ON_ICE is the most misunderstood. ON_HOLD + OFF_HOLD = start immediately if conditions met. ON_ICE + OFF_ICE = wait for next scheduling cycle.
AC ACTIVATED -- Box is RUNNING but this job hasn't started yet DE DELAYED -- Job waiting to meet its start time criteria FA FAILURE -- Job completed with non-zero exit code IN INACTIVE -- Job hasn't been triggered; waiting for schedule OH ON_HOLD -- Manually held; won't run until released OI ON_ICE -- Suspended; won't run even when conditions reappear OP OPERATOR -- Operator held (sendevent -E HOLDJOB) PE PEND_MACH -- Waiting for target machine (agent offline) QU QUE_WAIT -- Waiting for available machine load/resources RE RESTART -- Job is being restarted after failure RU RUNNING -- Job is currently executing ST STARTING -- Job sent to agent; waiting for execution to begin SU SUCCESS -- Job completed with exit code 0 TE TERMINATED -- Job was killed (term_run_time, KILLJOB event, etc.) UP UP_STREAM -- Waiting on upstream dependencies WD WAIT_DEP -- Waiting on file watcher condition
The problem statuses and what to do
These statuses almost always require your attention. Each has a specific diagnosis path.
# ── FAILURE: job failed ────────────────────────────────────────── # Check the error log first cat /logs/autosys/daily_report.err # Then check the event log autorep -J daily_report -d # Review recent runs autorep -J daily_report -run 7 # Retry after fixing the underlying issue sendevent -E FORCE_STARTJOB -J daily_report # ── PEND_MACH: agent machine offline ───────────────────────────── # First: check disk space on agent (MOST COMMON) ssh prod-server-01 'df -h' # Then check agent status autoping -m prod-server-01 autorep -M prod-server-01 # Restart agent after fixing issue /opt/CA/agent/bin/agent_start # ── TERMINATED: job was killed ──────────────────────────────────── # Check if killed by term_run_time or manually autorep -J slow_job -d | grep -E 'term_run_time|KILLJOB' # If due to term_run_time, increase it: update_job: slow_job term_run_time: 180 # ── QUE_WAIT: machine overloaded ────────────────────────────────── # Check machine load limits autorep -M machine-name autorep -J jobname -q | grep load
ERROR: Database connection timeout after 30s
PEND_MACH — prod-server-01: MISSING (agent service down)
Disk usage on prod-server-01: /apps 100% full — agent stopped
TERMINATED — term_run_time: 60 (exceeded). Job ran for 62 minutes.
| Status | Abbrev. | Normal or problem? | Typical cause | Action needed? |
|---|---|---|---|---|
| INACTIVE | IN | Normal | Waiting for schedule/conditions | No |
| ACTIVATED | AC | Normal | Box is RUNNING, job waiting its turn | No — but monitor if box runs too long |
| STARTING | ST | Normal | Agent spinning up process | No — brief transition |
| RUNNING | RU | Normal | Job executing | No — but monitor duration |
| SUCCESS | SU | Normal | Exit code 0 | No |
| FAILURE | FA | Problem | Non-zero exit code | Yes — investigate and fix |
| ON_HOLD | OH | Normal (if intentional) | Manual hold (sendevent) | Check if intentional |
| ON_ICE | OI | Normal (if intentional) | Manual freeze (sendevent) | Check if intentional |
| PEND_MACH | PE | Problem | Agent unreachable (full disk #1) | Yes — check agent machine |
| QUE_WAIT | QU | Possible problem | Machine load > max_load | Monitor — resolves automatically |
| TERMINATED | TE | Problem (usually) | Killed by term_run_time or manual | Investigate cause |
| RESTART | RE | Normal | Auto-retry after failure | Monitor for continued failure |
🎯 Key Takeaways
- PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
- ON_HOLD + OFF_HOLD = immediate start. ON_ICE + OFF_ICE = next cycle only. Master this difference.
- ACTIVATED is normal — job waiting inside running box. Don't force-start it.
- FAILURE requires investigation before restart. Check error log first, always.
- QUE_WAIT = machine overload. PEND_MACH = agent unreachable. Don't confuse them.
- TERMINATED = job was killed. Check if term_run_time or manual KILLJOB.
- autorep -d shows detail (start/end times). autorep -q shows JIL definition. Use both.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat does PEND_MACH status mean in AutoSys?Mid-levelReveal
- QWhat is the difference between ON_HOLD and ON_ICE in AutoSys?Mid-levelReveal
- QWhat does ACTIVATED status mean for a job inside a BOX?JuniorReveal
- QHow do you force-start a job that is in FAILURE status?Mid-levelReveal
- QWhat is the most common cause of jobs going to PEND_MACH in bulk?SeniorReveal
- QWhat does QUE_WAIT status mean and how is it different from PEND_MACH?SeniorReveal
Frequently Asked Questions
What does PEND_MACH mean in AutoSys?
PEND_MACH (PE) means the job is waiting for its target machine to become available. The Remote Agent on the target machine is not responding — usually because the agent service has stopped (commonly due to a full disk), the machine is offline, or there's a network issue.
First diagnostic step: SSH to the agent machine and run df -h. Full disk is the cause in 90% of cases. Clear space and restart the agent.
What does ACTIVATED mean in AutoSys?
ACTIVATED (AC) means the job's parent BOX is in RUNNING state, but the job itself hasn't started yet because its own conditions (start_times, condition attribute) haven't been met. It's a normal state — the job is queued inside a running box. Do NOT force-start a job in ACTIVATED unless the parent box has been running far longer than expected.
What is the difference between ON_HOLD and ON_ICE?
ON_HOLD: When you release a job from ON_HOLD (OFF_HOLD), it will run if its starting conditions are currently satisfied. ON_ICE: When you take a job off ICE (OFF_ICE), it will NOT run immediately — it waits until its conditions reoccur in the next scheduling cycle. ON_ICE is a stronger suspension.
Example: A midnight job put ON_ICE at 2 PM and released at 3 PM runs at midnight, not 3 PM. The same job put ON_HOLD at 2 PM and released at 3 PM would run immediately at 3 PM (if conditions are met).
How do I restart a failed AutoSys job?
autorep -J jobname -q | grep std_err_fileto find error logcat /path/to/error/fileto see the errorautorep -J jobname -L 5to see recent runs
Fix the underlying issue (code bug, missing file, permission). Then: - sendevent -E FORCE_STARTJOB -J jobname
Do NOT restart without investigating — you'll likely just see it fail again immediately.
What does TERMINATED status mean in AutoSys?
term_run_timeexceeded (AutoSys killed it after max runtime)- Manual
KILLJOBevent sent by operator - Agent machine went down while job was running
Check autorep -J jobname -d | grep -E 'term_run_time|KILLJOB' to see the cause. If term_run_time was too low, increase it. If manual kill, understand why the job needed killing. After fixing, use RESTART or FORCE_STARTJOB to rerun.
What does QUE_WAIT mean and how is it different from PEND_MACH?
QUE_WAIT (QU) means the machine is reachable but currently too busy to accept new jobs — the system load exceeds the machine's max_load setting. AutoSys queues the job until load drops.
- PEND_MACH: Agent unreachable (machine down, agent crashed, network issue). Requires intervention.
- QUE_WAIT: Agent reachable, machine overloaded. Resolves automatically.
Diagnosis: autorep -M machine-name shows max_load and current load. If load > max_load, QUE_WAIT is expected.
Fix: Wait for load to drop, or increase max_load on the machine definition.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.