Skip to content
Home DevOps AutoSys Architecture: The 1 Component That Stops All Jobs

AutoSys Architecture: The 1 Component That Stops All Jobs

Where developers are forged. · Structured learning · Free forever.
📍 Part of: AutoSys → Topic 2 of 30
AutoSys architecture and components explained with real outage stories.
🧑‍💻 Beginner-friendly — no prior DevOps experience needed
In this tutorial, you'll learn
AutoSys architecture and components explained with real outage stories.
  • AutoSys has four core components: Event Server (database), Event Processor (scheduler), Remote Agents (job executors), and client tools.
  • The Event Server is the single source of truth — every job definition, event, and status lives there.
  • The Event Processor continuously evaluates job conditions and triggers agents — it never executes jobs directly.
AutoSys Architecture — Component Interaction Diagram AutoSys component interaction showing Windows Agent and UNIX Agent on sides, connected to central CA Workload Automation AE Server containing Scheduler, Event Server (Database), Web Server, and Application Server. Client machines connect bidirectionally to Application Server. THECODEFORGE.IO AutoSys Architecture — Component Interaction How agents, scheduler, database and clients connect Windows Agent 👤 Agent autosys_agent port 7520 🪟 Windows Job CMD / PowerShell 💻 Windows Client jil / autorep sendevent / SDK Application CA Workload Automation AE Server (UNIX or Windows) Scheduler Evaluates events Triggers job execution 🗄️ Event Server (Database) Job defs · Events Calendars · Globals 🌐 Web Server (WCC) Dashboard · Monitor REST API · Reports 🖥️ Application Server Job submission · API gateway · Client communication Routes requests between clients and core scheduling engine UNIX Agent 👤 Agent autosys_agent port 7520 🐧 UNIX Job Shell / Python / Perl 💻 UNIX Client jil / autorep sendevent / SDK Application Scheduling flow Client communication Web/REST :7520 :7520 THECODEFORGE.IO
thecodeforge.io
AutoSys Architecture — Windows/UNIX Agents · Scheduler · Event Server · Application Server
Autosys Architecture Components
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • AutoSys components: Event Server (database of all jobs/events), Event Processor (scheduler daemon), Remote Agent (job executor), client tools (jil, autorep, sendevent)
  • Event Server stores all job definitions, event history, machine definitions; source of truth for entire AutoSys environment
  • Event Processor runs continuously, evaluates conditions, triggers jobs on Remote Agents — only ONE per AutoSys instance
  • Performance: Event Processor polls Event Server (~30s interval) — buffer for real-time alerts
  • Production trap: Remote Agent machine runs out of disk space — agent crashes, all jobs on that machine go PEND_MACH (stuck), no auto-recovery
  • Biggest mistake: Running multiple Event Processors — corrupts job state, leads to duplicate job execution
🚨 START HERE

AutoSys Component Debug Cheat Sheet

Fast diagnostics for AutoSys architecture issues in production environments.
🟡

Jobs stuck in PEND_MACH — all jobs on one machine

Immediate ActionCheck Remote Agent status and disk space
Commands
ps -ef | grep -i autosys_agent
df -h /var /tmp /opt
Fix NowRestart agent: `/etc/init.d/autosys_agent restart`. Then force start jobs: `for job in $(autorep -J % -m MACHINE_NAME -d | grep PEND_MACH | awk '{print $1}'); do sendevent -E FORCE_STARTJOB -J $job; done`
🟡

No jobs starting anywhere — Event Processor likely down

Immediate ActionCheck Event Processor status
Commands
autoping
ps -ef | grep eventor
Fix NowStart Event Processor: `eventor`. Check Event Server connectivity: `sqlplus autosys_user@autosys_db` (Oracle) or `isql -S autosys` (Sybase).
🟡

Jobs stuck in RUNNING but log shows they completed

Immediate ActionCheck if Remote Agent can write back to Event Server
Commands
tail -100 /var/log/autosys/agent.log | grep -i error
telnet EVENT_SERVER_HOST 7777 (default Event Server port?) Actually check agent config for event server port
Fix NowRestart agent. Check network connectivity between agent and Event Server. Firewall may have changed.
🟡

Job status inconsistent — duplicates, missing events

Immediate ActionCheck for duplicate Event Processors
Commands
ps -ef | grep eventor | grep -v grep | wc -l
autorep -J % -q | grep -i duplicate
Fix NowKill duplicate eventor processes: `pkill -f eventor` (careful — kills all). Use lock file to prevent multiple instances in startup script.
🟠

High Event Server CPU — slow job scheduling

Immediate ActionCheck Event Server database for large tables or missing indexes
Commands
sqlplus autosys_user@autosys_db <<EOF SELECT table_name, num_rows FROM user_tables WHERE table_name like 'AE%'; EOF
ls -lh $AUTOSYS/log/event_processor.log
Fix NowPurge old events: `db_purge_events -date 'MM/DD/YYYY'`. Run `analyze table` on Event Server tables. Increase event history retention threshold to reduce table size.
Production Incident

The Full Disk That Froze 500 Jobs

A Remote Agent's filesystem reached 100% capacity during a nightly batch window. The agent service crashed silently. All 500 jobs assigned to that machine went into PEND_MACH status. No alarms fired. The team discovered the issue 6 hours later when the morning reports never arrived.
SymptomAutoSys Web UI showed 500 jobs in PEND_MACH status. autorep -J % -d showed jobs waiting for machine 'prod-db-01'. Machine was still running, but AutoSys agent was not responding. No CPU spike, no memory pressure, no network issues. The only symptom was a full /var filesystem.
AssumptionThe team assumed the machine was healthy because it was pingable and SSH worked. They didn't know that the Remote Agent writes log files to /var and crashes silently when disk space runs out. They also had no monitoring on agent disk usage.
Root causeThe Remote Agent writes logs to /var/log/autosys/agent.log by default. A misconfigured application job generated 50GB of debug output overnight, filling /var. When the disk reached 100%, the agent service tried to write to the log, failed, and crashed. The Event Processor sent a heartbeat check to the agent, got no response, and marked all jobs on that machine as PEND_MACH (pending machine). No new jobs could start on that machine, and existing jobs continued running? Actually, running jobs continue, but crashed agent can't report completion. The crashed agent also could not start new jobs. The job stuck in RUNNING state until a human intervened.
Fix1. Added disk space monitoring for all agent machines: alert when /var > 80% full. 2. Configured log rotation for agent logs: logrotate with compression and 7-day retention. 3. Set disk_check_interval in agent config to check free space before writing. 4. For the offending job, limited log output to 100MB and added log rotation in the script. 5. Added a cron job that restarts the AutoSys agent if it's down: if ! ps -ef | grep -q 'autosys_agent'; then /etc/init.d/autosys_agent start; fi. 6. Documented PEND_MACH resolution steps: check disk space, restart agent, then sendevent -E FORCE_STARTJOB for stuck jobs.
Key Lesson
Remote Agent crashes are silent. The agent stops without raising an alarm. Jobs go PEND_MACH without notification.Always monitor disk space on agent machines. An 80% full alert is your early warning. At 95%, trigger immediate action.PEND_MACH does not auto-resolve. Even if the agent restarts, jobs remain PEND_MACH until manually forced.The Event Processor cannot distinguish between a crashed agent and a slow agent. It just waits for heartbeat timeout.
Production Debug Guide

Symptom → Action mapping for common AutoSys architecture failures.

All jobs on a specific machine stuck in PEND_MACH — no new jobs startRemote Agent likely crashed. Check if agent process is running: ps -ef | grep autosys_agent. Check disk space on agent machine: df -h /var. Check agent log: /var/log/autosys/agent.log. Restart agent: /etc/init.d/autosys_agent restart. Then force restart stuck jobs: sendevent -E FORCE_STARTJOB -J job_name.
No new jobs start anywhere — all jobs stuck regardless of machineEvent Processor may be down. Check Event Processor status: autoping. If down, restart: eventor. Also check Event Server database connectivity: isql -S autosys (Sybase). Check if Event Processor process exists: ps -ef | grep eventor.
Job status inconsistent — job shows SUCCESS but child didn't runEvents may be missing from Event Server. Check Event Processor log: $AUTOSYS/log/event_processor.log. Look for 'lost event' or 'queue overflow'. Increase Event Server buffer size or purge old events.
Duplicate jobs executed — same job runs twice at same timeTwo Event Processor instances running. AUTOFS: ps -ef | grep eventor | wc -l should be 1. If >1, kill duplicate processes. Prevent by using FLOCK on eventor lock file.
Job status updates delayed by hours — finished job still shows RUNNINGRemote Agent network latency or Event Server overload. Check Event Processor polling interval (default 30 seconds). For high-volume environments, increase Event Processor threads: max_threads in configuration.

Before you write a single line of JIL or schedule your first job, it helps to understand how AutoSys actually works under the hood. The architecture is straightforward but knowing what each component does — and why — will save you a lot of head-scratching when things go wrong in production.

AutoSys has four major components that work together: the Event Server, the Event Processor, Remote Agents, and client tools. Each has a clear job, and understanding the flow between them makes debugging much easier.

By the end you'll know exactly how job definitions flow from JIL to Event Server to Event Processor to Remote Agent and back. You'll understand what PEND_MACH means and why it's the most common production issue. And you'll know the component that, when it fails, stops all job scheduling.

The Event Server — the source of truth

The Event Server is a relational database (typically Sybase or Oracle) that stores everything AutoSys needs to operate. This includes all job definitions (what to run, when, where, under which conditions), all events that have occurred (job started, job succeeded, job failed), global variable values, machine definitions, calendar definitions, and monitor and report definitions.

When a job finishes and reports its status, that status goes into the Event Server. When the Event Processor needs to know whether a dependent job's condition is met, it queries the Event Server. It's the single source of truth for the entire AutoSys environment.

🔥High availability with dual event servers
AutoSys supports a primary/shadow Event Server configuration. If the primary goes down, the shadow takes over automatically — no manual intervention needed. This is critical for environments that can't afford job scheduling downtime.
📊 Production Insight
The Event Server is the single point of failure for all AutoSys metadata. If it goes down, no job definitions can be read, no status updates can be written.
Dual Event Servers (primary/shadow) provide failover with no downtime, but require manual failover in older versions? Actually, primary/shadow is automatic via heartbeat.
Rule: Monitor Event Server CPU, disk I/O, and table sizes. A 500GB Event Server table with no indexes will cause 30-second query delays, stalling all job scheduling.
🎯 Key Takeaway
Event Server database stores everything — job definitions, event history, machine definitions, calendars. It is the source of truth.
Dual Event Servers provide high availability. Always configure shadow server for critical environments.
Rule: Purge old event history regularly (db_purge_events) to keep query performance acceptable.

The Event Processor — the brain

The Event Processor (also called the scheduler or the event daemon) is the most important component. It runs continuously, polling the Event Server for events. When it detects that a job's starting conditions are met — the right time has arrived, dependent jobs have succeeded, the machine is available — it triggers the job to run on the appropriate agent.

The Event Processor also handles time-based scheduling, evaluates job condition logic, and manages the overall state machine for each job. On Unix/Linux it's started with the eventor command. There is only ever one Event Processor running per AutoSys instance.

io/thecodeforge/autosys/start_event_processor.sh · BASH
12345678
# Start the AutoSys event processor (UNIX only)
eventor

# Check if AutoSys components are up
autoping

# Check AutoSys flags and system status
autoflags -a
📊 Production Insight
The Event Processor runs exactly once per AutoSys instance. A second instance causes duplicate job runs and state corruption.
The Event Processor polls the Event Server at configurable intervals (default 30 seconds). This means job conditions are not evaluated in real time.
Rule: Never run two Event Processors. Use FLOCK in startup scripts. Monitor eventor process count with cron: if [ $(ps -ef | grep eventor | wc -l) -ne 1 ]; then alert; fi.
🎯 Key Takeaway
Event Processor is the decision engine — evaluates conditions, triggers jobs on Remote Agents.
Only ONE Event Processor per AutoSys instance. Duplicate eventors cause duplicate job execution.
Rule: Monitor Event Processor uptime. If it dies, no new jobs start. Running jobs continue to completion.

Remote Agents — where the work actually happens

Remote Agents run on every machine where AutoSys needs to execute jobs. When the Event Processor decides a job should run, it sends a message to the Remote Agent on the target machine. The agent starts the process, monitors it, captures the exit code, and reports the result back to the Event Server.

Agents can be extended with plugins for specific integrations — SAP, Oracle E-Business, PeopleSoft, and others. If an agent goes down, jobs that are supposed to run on that machine go into PEND_MACH status and wait until the agent comes back up.

⚠ PEND_MACH is one of the most common production issues
If a machine's filesystem fills up, the agent service crashes and all jobs on that machine go PEND_MACH. This is a very common production incident — always monitor disk space on agent machines.
📊 Production Insight
The Remote Agent is a lightweight process, but it can crash silently. No alarm is raised by default.
When an agent goes down, jobs already running continue but cannot report completion, and new jobs cannot start.
Rule: Monitor agent heartbeat via Event Processor. If agent unreachable for >5 minutes, send alert. Monitor disk space, memory, and agent process existence on each agent machine.
🎯 Key Takeaway
Remote Agents execute jobs on target machines and report results back to Event Server.
PEND_MACH status means agent is unreachable. Jobs stay in PEND_MACH even after agent recovers; must force start.
Rule: Monitor agent disk space and process health. An agent crash is silent without external monitoring.

Client tools — how you interact with AutoSys

Client tools are the interfaces you use to define, manage, and monitor jobs. The main ones are: jil — the command-line JIL processor for creating and modifying job definitions; autorep — reports job status and definitions; sendevent — manually triggers events like starting a job or putting it on hold; autostatus — checks the current status of a specific job; and the WCC Web UI — a browser-based dashboard for monitoring job flows visually.

Most experienced AutoSys administrators work primarily from the command line using jil, autorep, and sendevent. The GUI is useful for monitoring and for people less comfortable with CLI.

io/thecodeforge/autosys/autosys_client_commands.sh · BASH
1234567891011
# Check status of a specific job
autostatus -J daily_report

# Get a detailed report on a job
autorep -J daily_report -d

# List all jobs in a box
autorep -J box_name%

# Check machine status
autorep -M prod-server-01
📊 Production Insight
Client tools connect directly to the Event Server. No intermediate services required.
jil and autorep are essential for scripting and automation. The GUI is convenient but adds no functionality.
Rule: Write scripts using autorep -J % -d for monitoring, sendevent -E FORCE_STARTJOB for recovery. Avoid manual GUI actions in automated recovery procedures.
🎯 Key Takeaway
jil defines jobs, autorep reports status, sendevent triggers manual events, WCC GUI provides visual monitoring.
CLI tools are scriptable; use them in automation. GUI is fine for ad-hoc monitoring.
Rule: For production support, master the CLI tools. The GUI is not available in an SSH session.
🗂 AutoSys Components Comparison
Each component has a distinct role in the scheduling pipeline
ComponentTypeRuns OnKey ResponsibilityFailure ImpactHow to Monitor
Event ServerDatabase (Sybase/Oracle)Dedicated serverStores all job definitions, events, stateCatastrophic — no job definitions can be read, no status updatesCheck database connectivity, table sizes, I/O latency, dual server sync
Event ProcessorDaemon/ServiceAutoSys serverEvaluates conditions, triggers jobsSevere — no new jobs start, running jobs continueautoping, ps -ef | grep eventor, check log for errors
Remote AgentServiceEvery target machineExecutes jobs, reports resultsLocal — jobs on that machine go PEND_MACH, other machines unaffectedCheck process exists, disk space, network connectivity to Event Server
jilCLI clientAny client machineDefine/modify job definitionsNone (if jil fails, use another client)N/A (tool exits with non-zero code on error)
autorepCLI clientAny client machineReport job status and definitionsNone (use another client)N/A
sendeventCLI clientAny client machineManually trigger events (START, HOLD, etc.)None (use another client)N/A
WCC (Web UI)Browser GUIBrowserVisual monitoring and managementNone (use CLI if GUI down)Check WCC service status, HTTP response

🎯 Key Takeaways

  • AutoSys has four core components: Event Server (database), Event Processor (scheduler), Remote Agents (job executors), and client tools.
  • The Event Server is the single source of truth — every job definition, event, and status lives there.
  • The Event Processor continuously evaluates job conditions and triggers agents — it never executes jobs directly.
  • Remote Agents run on target machines and execute the actual work, reporting results back to the Event Server.
  • PEND_MACH is one of the most common production issues and is caused by agent machines going offline or running out of disk space.

⚠ Common Mistakes to Avoid

    Assuming the Event Processor runs the job itself — it doesn't
    Symptom

    Debugging a job that fails but Event Processor logs show nothing. The team looks in wrong place for job output.

    Fix

    Understand the flow: Event Processor triggers Remote Agent, which executes the job. Check agent logs on target machine, not Event Processor logs.

    Running multiple Event Processor instances on the same AutoSys instance
    Symptom

    Duplicate job runs. Same job starts twice at same time. Event Server state becomes corrupted.

    Fix

    Ensure only one eventor process: ps -ef | grep eventor | grep -v grep | wc -l should be 1. Use lock file in startup script: FLOCK -n /var/lock/eventor.lock -c eventor.

    Not monitoring the Event Server database size
    Symptom

    Event history accumulates for years. Table sizes grow to 500GB+ (AE_QUEUE, AE_EVENTS). Query performance degrades, job scheduling slows down.

    Fix

    Regularly purge old events: db_purge_events -date '01/01/2026'. Run analyze table on Event Server tables. Set retention policy: keep events for 90 days, archive older events to flat file.

    Forgetting that the agent user account needs the right permissions
    Symptom

    Job fails immediately with permission denied. Agent logs show 'cannot execute command as user x'.

    Fix

    Ensure the AutoSys agent user (typically autosys) has execute permission on job scripts and read/write access to log directories. Test by su - autosys -c '/path/to/script' before scheduling.

    Not handling PEND_MACH recovery after agent restart
    Symptom

    Agent restarts, but jobs remain stuck in PEND_MACH status. Team restarts agent, but jobs still don't run.

    Fix

    PEND_MACH does NOT auto-resolve. After agent restart, force start jobs: for job in $(autorep -J % -d | grep PEND_MACH | awk '{print $1}'); do sendevent -E FORCE_STARTJOB -J $job; done. Or use sendevent -E FORCE_STARTMACH -M machine_name to restart all jobs on that machine.

Interview Questions on This Topic

  • QWhat are the four main components of AutoSys architecture?JuniorReveal
    Event Server (database storing job definitions and events), Event Processor (scheduler daemon that evaluates conditions and triggers jobs), Remote Agent (executes jobs on target machines), Client Tools (jil, autorep, sendevent, WCC UI for user interaction). The Event Server is the source of truth. The Event Processor is the decision engine. Remote Agents are the workers. Client tools are the interfaces.
  • QWhat does the Event Processor do and how does it interact with the Event Server?Mid-levelReveal
    The Event Processor runs continuously, polling the Event Server database for events and conditions. When it detects that a job's starting conditions are met (time, dependencies, machine availability), it sends a trigger message to the Remote Agent on the target machine. It also updates job status in the Event Server based on agent reports. The Event Processor never executes jobs directly — it only triggers agents. It runs as a single daemon (eventor) per AutoSys instance. Multiple eventors cause duplicate job runs and state corruption.
  • QWhat happens to jobs when a Remote Agent machine goes down?Mid-levelReveal
    Jobs that are already running on that machine continue to run because they're independent processes, but the agent cannot report their status back to the Event Server. Completed jobs will appear as RUNNING in AutoSys, and new jobs scheduled on that machine go into PEND_MACH (pending machine) status, waiting for the agent to become available. When the agent restarts, jobs remain in PEND_MACH and must be manually forced to start using sendevent -E FORCE_STARTJOB or sendevent -E FORCE_STARTMACH. There is no auto-recovery for PEND_MACH. PEND_MACH is the most common production issue. It often results from a full filesystem (agent cannot write logs) or a crashed agent process. Always monitor agent disk space.
  • QWhat is PEND_MACH status and what causes it?JuniorReveal
    PEND_MACH (Pending Machine) means the job is ready to run but the Remote Agent on the target machine is unreachable. Causes include: agent process crashed, disk full (agent cannot write logs), network partition between Event Processor and agent, agent service stopped manually, or machine down. PEND_MACH is critical because jobs stay stuck even after agent recovers; they require manual sendevent -E FORCE_STARTMACH to resume. Monitoring agent disk space (especially /var) and process existence prevents PEND_MACH incidents.
  • QCan you run multiple Event Processors for the same AutoSys instance?SeniorReveal
    No. You must run exactly one Event Processor per AutoSys instance. Running multiple eventors causes: duplicate job execution (same job runs twice), state corruption in Event Server (conflicting updates), race conditions in job scheduling, and unpredictable behaviour. The startup script should use file locking (FLOCK) to prevent multiple instances. Monitor with: if [ $(ps -ef | grep eventor | wc -l) -ne 1 ]; then alert; fi. If duplicate eventors are found, kill all but one and investigate how they started.

Frequently Asked Questions

What database does AutoSys use for the Event Server?

AutoSys traditionally used Sybase as its backend database. Newer versions also support Oracle. The Event Server stores all job definitions, events, and system state.

What happens if the Event Processor crashes?

If the Event Processor goes down, no new jobs will be triggered. Jobs that are already running will continue until they complete. AutoSys supports a shadow scheduler that can take over if the primary Event Processor fails, providing high availability.

Can I run AutoSys jobs on Windows machines?

Yes. AutoSys Remote Agents are available for both Unix/Linux and Windows. You can schedule jobs to run on Windows machines the same way as Unix machines, though some command-line tools like eventor are Unix-only.

What is the difference between the Event Server and the Event Processor?

The Event Server is the database that stores all data. The Event Processor is the daemon that reads from the Event Server, evaluates job conditions, and triggers jobs. One is data storage; the other is the decision engine.

How do I recover from PEND_MACH status?

First, check the agent machine: disk space (df -h), agent process (ps -ef | grep autosys_agent), agent logs (/var/log/autosys/agent.log). Restart agent if needed: /etc/init.d/autosys_agent restart. Then force start jobs: sendevent -E FORCE_STARTMACH -M machine_name. Stuck jobs will then run immediately. Without force start, jobs remain PEND_MACH forever.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousIntroduction to AutoSysNext →AutoSys Event Server and Event Processor
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged