AutoSys Status Codes — PEND_MACH Disk Space Trap
47 jobs stuck in PE status? A full /var partition silently kills the Remote Agent.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
- SUCCESS (SU): Job completed with exit code 0. Normal. No action.
- FAILURE (FA): Job exited non-zero. Check std_err_file before restarting.
- RUNNING (RU): Job executing. Monitor runtime vs expected duration.
- PEND_MACH (PE): Agent unreachable. First check: disk space on agent (df -h).
- ON_HOLD (OH): Manually held. OFF_HOLD starts immediately if conditions met.
- ON_ICE (OI): Suspended. OFF_ICE waits for next scheduling cycle.
- ACTIVATED (AC): Job inside RUNNING box, waiting its turn. Normal.
- TERMINATED (TE): Job killed (term_run_time or KILLJOB). Investigate why.
- Production rule: 20+ jobs in PEND_MACH on same machine = disk full. Clear space, restart agent.
AutoSys job statuses are like package tracking codes — each status tells you exactly where your job is in its lifecycle. RUNNING means it's on the truck. SUCCESS means it was delivered. FAILURE means something went wrong. PEND_MACH means the truck broke down.
You'll stare at autorep output every day. SU, FA, RU, OH, OI, PE, TE, AC, IN, ST, QU, RE.
That's 12 status codes. Some are normal. Some need immediate action. Some are traps that operations teams misinterpret for hours.
This article covers each status — what it means, what causes it, and exactly what to do. The difference between ON_HOLD and ON_ICE appears in every AutoSys interview. Master it here.
What AutoSys Status Codes Actually Tell You — and What They Hide
AutoSys status codes are integer values (0–255) emitted by jobs at termination, encoding the exit condition as determined by the AutoSys agent. The core mechanic: the agent captures the process exit code and maps it to a status string (SUCCESS, FAILURE, TERMINATED, etc.), but the raw code itself is often lost or overwritten by the agent's own logic. In practice, the most dangerous status is PEND_MACH (pending machine) — it means the job never started because no agent could be assigned. The agent selection algorithm considers machine load, time zone, and operator-defined attributes, but a full disk on the primary agent host silently prevents assignment, leaving the job in PEND_MACH with exit code 0. This is not a job failure; it's an infrastructure failure that looks like a scheduling delay. Senior engineers treat PEND_MACH as a canary for disk space or agent health, not a job issue. Use status codes to distinguish between job logic failures (non-zero exit) and execution failures (PEND_MACH, TERMINATED). In real systems, monitoring PEND_MACH trends catches disk-full scenarios before they cascade.
Normal lifecycle statuses
These are the statuses a job moves through during a healthy execution. You should see them regularly and not be alarmed.
INACTIVE (IN): Job hasn't been triggered yet. Waiting for its schedule or conditions. This is the starting state for most scheduled jobs.
ACTIVATED (AC): The job's parent BOX is in RUNNING state, but the job itself hasn't started yet because its own conditions aren't met. This is normal — it means the box is active and the job is queued.
STARTING (ST): Brief transition state. The Event Processor sent the start command to the agent, and the agent is spinning up the process. Usually lasts under 3 seconds.
RUNNING (RU): Job is executing on the agent machine. Monitor runtime against expected duration. A job that runs 4x longer than usual may be hung.
SUCCESS (SU): Job completed with exit code 0. Normal completion. No action needed.
RESTART (RE): Job is being automatically retried after a failure (n_retrys configured). This is normal for jobs with retry logic. Only be concerned if the final attempt also fails.
- INACTIVE → ACTIVATED → STARTING → RUNNING → SUCCESS (normal flow)
- FAILURE, TERMINATED, PEND_MACH = delivery exceptions
- ON_HOLD/ON_ICE = manual holds (customer requested hold)
- The Event Processor is the dispatch center. Agents are the delivery trucks.
All status codes with abbreviations
The autorep command uses two-letter abbreviations for status codes. Here they all are, including the less common ones:
Normal/Interim: AC (ACTIVATED), IN (INACTIVE), ST (STARTING), RU (RUNNING), SU (SUCCESS), RE (RESTART)
Problem/Unusual: FA (FAILURE), PE (PEND_MACH), TE (TERMINATED), QU (QUE_WAIT)
Manual holds: OH (ON_HOLD), OI (ON_ICE)
Less common: DE (DELAYED — job waiting for start time), OP (OPERATOR — operator held), SP (STOPPED), UP (UP STREAM — waiting on upstream), WD (WAIT_DEP — waiting on file watcher)
The difference between ON_HOLD and ON_ICE is the most misunderstood. ON_HOLD + OFF_HOLD = start immediately if conditions met. ON_ICE + OFF_ICE = wait for next scheduling cycle.
The problem statuses and what to do
These statuses almost always require your attention. Each has a specific diagnosis path.
The Status Transition Matrix: Why Your Job Didn't Fail — It Just Never Started
Most engineers treat AutoSys status codes as final verdicts. They're not. They're snapshots of a sequence that can stall at any point. A job stuck in ACTIVATED for 15 minutes isn't failing — it's waiting on a condition that never evaluated true. A job bouncing between QUEUED and STARTING is a dispatcher bottleneck, not a script error.
The real skill is reading the transition path, not the final status. When you see INACTIVE after a failure, that's the system cleaning house — your job already exhausted its retry limit. When you see TERMINATED with exit code 0, someone sent a kill signal mid-execution. The exit code is meaningless then.
Build a mental model of the state machine. Every status has a permitted predecessor and successor. If you see a transition that shouldn't happen — like SUCCESS flipping to INACTIVE without an intervening failure — you're looking at a force-restart or a database corruption. Both require human intervention, not automation.
Exit Codes vs. AutoSys Status Codes: The Lie You've Been Told
Here's what nobody tells you: AutoSys status codes and process exit codes are different layers of reality. A job can exit with code 0 (success to the OS) but AutoSys flags it as FAILURE because of exit code mapping. Conversely, a script that dumps core with exit code 139 can be registered as SUCCESS if the job profile says 'ignore_exit_code: 1'.
The exit_code_mapping parameter is a minefield. Default behavior: any non-zero exit is failure. But production systems routinely map exit code 2 to SUCCESS because some old Perl script uses 2 for "no data found" — which is not an error. That's your problem. You're debugging an AutoSys FAILURE that is actually a business logic success.
Stop looking at status codes in isolation. Cross-reference them with the actual exit code from the job's stdout log. If you see FAILURE with exit code 0, you have a timeout issue — the job ran past max_run_alarm. If you see SUCCESS with exit code 137, someone sent SIGKILL to the process and AutoSys swallowed it.
Map your exit codes explicitly. Put it in version control. Treat the defaults as suspicious.
The 2 AM PEND_MACH That Wasn't a Network Problem
- PEND_MACH on multiple jobs → same machine → disk space is #1 cause.
- Diagnosis order: disk space → agent process → network → everything else.
- df -h is the first command, not the last. It saves hours.
- Agent machines need disk monitoring. Full disk stops the agent silently.
autorep -J JOBNAME -q | grep std_err_fileautorep -J JOBNAME -L 10 | grep -A 5 FAILUREKey takeaways
Common mistakes to avoid
5 patternsConfusing ON_HOLD and ON_ICE
Not checking PEND_MACH machine disk space first
Restarting jobs in FAILURE without first reading the error log
Not distinguishing INACTIVE from ON_ICE
Treating ACTIVATED as a stuck state
Interview Questions on This Topic
What does PEND_MACH status mean in AutoSys?
df -h (check disk space)
2. Check agent process: ps -ef | grep autosys
3. Test connectivity: autoping -m agent-host
Fix: Clear disk space, restart agent, or fix network issue.Frequently Asked Questions
JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
That's AutoSys. Mark it forged?
4 min read · try the examples if you haven't