Intermediate 3 min · March 19, 2026

AutoSys Interview Questions and Answers

AutoSys Interview Questions — 50 Bank & Insurance Q&As

Q: What AutoSys questions come up most in interviews?

The most common questions in order: 1. ON_HOLD vs ON_ICE (appears in almost every interview) 2. PEND_MACH causes and resolution 3. date_conditions default and meaning 4. FORCE_STARTJOB vs STARTJOB 5. box_terminator definition and use case 6. Design an end-of-day batch workflow If you can answer these six questions confidently, you'll pass most AutoSys interviews. The rest are details.

Q: What is the most commonly asked AutoSys interview question?

The ON_HOLD vs ON_ICE question appears in nearly every AutoSys interview. The key answer: releasing from ON_HOLD starts the job immediately if conditions are currently met; releasing from ON_ICE makes the job wait for conditions to reoccur in the next scheduling cycle — it does not start immediately. **Example to solidify**: A job runs at midnight. At 2 PM, you put it ON_ICE. At 3 PM, you release it. It runs at midnight, not at 3 PM. Interviewers love this question because 60% of candidates get it wrong or give a vague answer.

Q: Do I need hands-on AutoSys experience to pass the interview?

For operational roles (SRE, batch operations, production support), yes — interviewers ask specific command syntax and scenario questions that require real experience. Studying concepts is necessary but not sufficient. For architecture or design roles (global notices say 'AutoSys knowledge preferred'), you may pass with strong conceptual understanding and transferable experience from other schedulers (Control-M, TWS). If you don't have direct experience: setting up a trial environment (Broadcom offers developer licenses) or documenting your company's existing AutoSys setup (even just reading JIL definitions) is valuable.

Q: What is the PEND_MACH answer in AutoSys interviews?

PEND_MACH means the Remote Agent on the target machine is unavailable. **Causes** (in order of likelihood): 1. Full disk on agent machine — agent service stopped 2. Agent service not running 3. Machine offline 4. Network issue 5. Firewall blocking port 7520 **Diagnosis**: 1. SSH to agent: `df -h` 2. Check agent: `ps -ef | grep autosys` 3. Test port: `telnet agent-host 7520` **Fix**: - Full disk: `sudo rm /tmp/autosys_logs/*` (old logs), restart agent - Agent stopped: start agent service - Network: engage network team **Interview tip**: Say 'disk space is the most common cause — check df -h first' to show operational experience.

Q: How do I explain AutoSys to a non-technical interviewer?

AutoSys is an enterprise scheduling and orchestration tool. It runs thousands of batch jobs every night — like payroll, financial reconciliation, data processing — across hundreds of servers. Think of it like a very sophisticated alarm clock for a company's servers. It wakes up programs at specific times, in a specific order, waits for them to finish, and immediately alerts the team if anything goes wrong. It can also trigger jobs when a file arrives (event-driven) rather than at a fixed time. Large banks and insurance companies use it because they can't afford for payroll or trade settlement to fail or run out of order. This explanation works for hiring managers from non-technical backgrounds (HR, staffing agencies).

ON_HOLD runs immediately on release; ON_ICE waits for next cycle.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.

✓ Production

production tested

July 27, 2026

last updated

1,750

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

ON_HOLD: releasing starts job immediately if conditions met. ON_ICE: releasing waits for next scheduling cycle. Most common wrong answer.
PEND_MACH = agent unreachable. First check: disk space on agent (df -h). 90% of cases.
date_conditions defaults to 0 (time scheduling disabled). Most people assume it's 1.
FORCE_STARTJOB bypasses ALL conditions (time AND dependencies). STARTJOB respects them.
box_terminator: 1 stops entire box when job fails. Use on validation jobs only.
Global variables: SET_GLOBAL writes, autostatus -G reads, variable() in JIL conditions.

✦ Definition~90s read

What is AutoSys Interview Questions and Answers?

AutoSys is a distributed job scheduling and workload automation platform from CA Technologies (now Broadcom), used primarily in financial services and insurance to orchestrate batch processing, ETL pipelines, and regulatory reporting. It solves the problem of managing thousands of interdependent jobs across heterogeneous systems (UNIX, Windows, mainframes) with deterministic sequencing, time-based triggers, and event-driven execution.

★

This article collects the AutoSys questions that actually come up in interviews at banks, insurance companies, and enterprise IT shops — and gives you the complete, correct answers, not the vague half-answers you'll find elsewhere.

In banking and insurance, AutoSys is the backbone for overnight batch cycles that process trades, calculate risk, generate statements, and feed downstream systems — failures here mean regulatory breaches or multi-million dollar losses. The platform's core components are the Event Server (a persistent database that tracks job state), the Agent (runs on each machine to execute jobs), and the Client interfaces (GUI, CLI, JIL).

JIL (Job Information Language) is the declarative scripting language used to define jobs, dependencies, conditions, and calendars — think of it as cron on steroids with cross-system awareness. Alternatives include Control-M, Tivoli Workload Scheduler, and open-source tools like Airflow or Rundeck, but AutoSys dominates legacy banking environments due to its mainframe heritage, robust failover, and audit trail capabilities.

You should not use AutoSys for real-time streaming, microservice orchestration, or lightweight cron replacements — it's designed for heavy, stateful, multi-step batch workflows where job order and error recovery are non-negotiable. The interview questions in this article target the specific failure modes and edge cases that trip up candidates: what happens when a job stays in RUNNING state for 8 hours, how to handle event server failover without losing job history, and why 'status: SUCCESS' doesn't always mean the job actually ran correctly.

Plain-English First

AutoSys interviews are specific. Interviewers know the tool. Vague answers about 'scheduling jobs' fail.

This guide assumes you've worked through the other articles in this track. It's your review. The questions are organised from foundational to advanced. The answers are complete, not truncated.

The most common wrong answer? ON_HOLD vs ON_ICE. That question appears in almost every interview. Get it right.

What AutoSys Interview Questions Actually Test

AutoSys is an enterprise job scheduling and workload automation tool used to define, manage, and monitor batch jobs across distributed systems. Its core mechanic is the Job Information Language (JIL), a declarative syntax that specifies job attributes like command, machine, start condition, and dependencies. Interview questions probe your ability to translate business scheduling requirements into JIL definitions and troubleshoot job failures in complex dependency chains.

In practice, AutoSys jobs are event-driven: a job triggers based on time, file arrival, or the exit code of another job. Key properties include box jobs (containers for grouping), global variables, and condition expressions (e.g., 'success(jobA) AND exitcode(jobB) = 0'). Understanding how AutoSys handles job states (SUCCESS, FAILURE, TERMINATED, RUNNING) and the role of the Event Server and Remote Agent is critical for answering scenario-based questions.

Use AutoSys when you need reliable, auditable batch processing with cross-platform orchestration — common in banking for end-of-day reconciliations, report generation, or data warehouse loads. It matters because a misconfigured dependency or missing file trigger can halt a critical business process, causing SLA breaches. Interviewers look for candidates who can design resilient schedules with proper error handling, restart logic, and monitoring.

⚠ JIL Is Not a Scripting Language

AutoSys JIL defines job metadata and dependencies, not logic. Do not confuse JIL conditions with shell scripting — they only evaluate exit codes and job states.

📊 Production Insight

A bank's nightly trade settlement job chain failed because a predecessor job succeeded with exit code 1 (a warning), but the downstream job's condition required exitcode(prev) = 0.

Symptom: downstream job never triggered despite upstream completing; operations team found no error in logs, only a 'condition not met' status.

Rule: always define success criteria explicitly — either use 'success(prev)' to ignore exit codes, or set 'term_run_time' and 'max_run_alarm' to catch unexpected exit codes early.

🎯 Key Takeaway

AutoSys interview questions are 80% JIL syntax and condition logic, 20% troubleshooting failed jobs.

Master box jobs and global variables to reduce duplication and simplify dependency management.

Always design for restartability — use 'max_retry', 'term_run_time', and 'watch_file' to handle transient failures without manual intervention.

thecodeforge.io

Autosys Interview Questions

Architecture and concepts

These questions test whether you understand what AutoSys actually is and how it works internally. They're usually early in the interview to establish baseline knowledge.

architecture_qa.txtBASH

Q: What is AutoSys and what problem does it solve?
A: AutoSys is Broadcom's enterprise workload automation platform for scheduling,
   monitoring, and orchestrating batch jobs across multiple servers. It solves the
   scalability problems of cron: dependency management, centralised visibility, alerting,
   audit trails, and multi-server coordination.

Q: What are the main components of AutoSys architecture?
A: Event Server (database storing all definitions and events), Event Processor
   (scheduling daemon that evaluates conditions and triggers agents), Remote Agents
   (lightweight processes on each target machine), and Clients (CLI tools + WCC web UI).

Q: What happens when the Event Processor goes down?
A: Job triggering stops. Jobs that are currently RUNNING continue to completion (the
   agent handles execution independently), but no new jobs will be triggered until
   the Event Processor is restarted.

🔥Interview tip — Event Processor vs Event Server

Interviewers often ask 'what's the difference?' The Event Server is the database (storage). The Event Processor is the daemon (evaluation). One stores state, the other triggers jobs.

📊 Production Insight

A candidate answered 'The Event Processor writes to the Event Server.' That's backwards. The Event Processor reads from the Event Server. The Event Server is written to by agents and sendevent commands. The processor is stateless.

The interviewer asked a follow-up: 'If the Event Server goes down, do running jobs continue?' The candidate didn't know. Answer: Yes — the agent runs jobs independently. But job completion status cannot be written back.

Rule: Know which component does what. If you confuse direction, you fail the architecture section.

🎯 Key Takeaway

Event Server = database (storage). Event Processor = daemon (evaluation).

Event Processor down = no new jobs. Running jobs continue.

Agent down = jobs on that PEND_MACH. Other agents fine.

Know the failure modes: silent, not sudden.

Component failure — what happens to jobs?

IfEvent Processor crashes

→

UseRunning jobs continue. No new jobs start. Status updates queue.

IfEvent Server unreachable

→

UseRunning jobs continue. Completion status can't be saved. Agent may retry.

IfRemote Agent on machine down

→

UseJobs on that machine stay PENDING. Other machines unaffected.

IfNetwork between server and agent down

→

UseJobs on that machine go PEND_MACH. Agent can't start jobs or report status.

JIL and job operations

These test practical JIL knowledge — what interviewers really want to know is whether you've actually used the tool, not just read about it.

jil_operations_qa.txtBASH

Q: What is the difference between insert_job and update_job?
A: insert_job creates a new job definition — fails if job already exists.
   update_job modifies an existing job (partial update, only changed attributes).
   Fails if job doesn't exist.

Q: What is the difference between delete_job and delete_box?
A: delete_job on a box removes only the box, leaving inner jobs as standalone.
   delete_box removes the box AND all its inner jobs.

Q: How do you back up AutoSys job definitions?
A: autorep -J % -q > backup_$(date +%Y%m%d).jil
   This dumps all job definitions in JIL format to a file.

Q: How do you view the JIL definition of an existing job?
A: autorep -J jobname -q

Q: What does FORCE_STARTJOB do differently from STARTJOB?
A: FORCE_STARTJOB starts the job immediately bypassing all conditions
   (date_conditions, start_times, condition attribute). STARTJOB only triggers
   if conditions are currently met.

⚠ Most missed JIL question: delete_job vs delete_box

On a box: delete_job removes the box container. Inner jobs become standalone. delete_box removes box AND inner jobs. This is a common trick question — if you say 'delete_job removes the box and its jobs', you're wrong.

📊 Production Insight

An operations engineer used delete_job on a production box thinking it would remove all inner jobs. It didn't. The box vanished. All inner jobs became orphaned standalone jobs. They continued running on their own schedules, independent of dependencies.

A trading settlement job ran 4 hours early because its parent box was gone. The box had enforced a start time. Without the box, the job ran at its own start time — which was 2 PM, not 6 PM.

Recovery: regenerate box definition from backup (autorep -J boxname -q had been saved). Reinsert box. Reassociate inner jobs with box_name attributes.

Rule: Always have current JIL backups. autorep -J % -q weekly. Delete box? Use delete_box or expect orphaned jobs.

🎯 Key Takeaway

insert_job vs update_job: create vs modify. delete_job vs delete_box: box-only vs box+children.

Backups: autorep -J % -q > backup.jil — do this weekly.

FORCE_STARTJOB bypasses ALL conditions. STARTJOB respects them.

JIL is case-sensitive on Linux. JOB vs Job are different.

thecodeforge.io

Autosys Interview Questions

Status codes and troubleshooting

These test operational knowledge — have you actually been on-call for an AutoSys environment? Interviewers love status code questions because they separate theory from practice.

status_trouble_qa.txtBASH

Q: What does PEND_MACH mean and what usually causes it?
A: PEND_MACH (PE) means the Remote Agent on the target machine is unavailable.
   Most common cause: the agent machine's filesystem is 100% full, stopping the
   agent service. Check disk space first: ssh machine01 'df -h'

Q: What is the difference between ON_HOLD and ON_ICE?
A: ON_HOLD: releasing (OFF_HOLD) starts the job immediately if conditions are currently met.
   ON_ICE: releasing (OFF_ICE) makes the job wait for conditions to reoccur in the
   next scheduling cycle — it does not start immediately.

Q: A job was failing every night for a week. What's your troubleshooting approach?
A: 1. Check std_err_file for the error pattern
   2. Check if it's always the same exit code (consistent root cause)
   3. Check autorep -J jobname -run 7 to compare recent runs
   4. Check if it correlates with system events (deployments, maintenance)
   5. Engage the application team who owns the script

Q: How do you unblock downstream jobs after manually fixing a failed job?
A: sendevent -E CHANGE_STATUS -J fixed_job -s SUCCESS
   This marks the job SUCCESS so all downstream success() conditions are met.

⚠ ON_HOLD vs ON_ICE — the most common wrong answer

Most candidates say 'they're the same'. They're not. OFF_HOLD starts immediately. OFF_ICE waits for next schedule. If you get this wrong, you fail the status section. Know it cold.

📊 Production Insight

A candidate correctly defined ON_HOLD vs ON_ICE. Then the interviewer asked: 'You have a job that runs at midnight. At 2 PM, you put it ON_ICE. At 3 PM, you release it. When does it run?'

The candidate thought: immediately. Wrong. ON_ICE release waits for the next scheduling cycle — midnight. The job ran at midnight, not 3 PM.

The candidate would have failed the real scenario. Operational experience matters more than definitions.

Rule: ON_HOLD = manual overrides during the day. ON_ICE = permanent schedule changes or avoiding out-of-cycle runs.

🎯 Key Takeaway

PEND_MACH = agent unreachable. First check: disk space.

ON_HOLD = immediate resume. ON_ICE = next scheduled cycle. Learn it.

CHANGE_STATUS -s SUCCESS unblocks downstream after manual fix.

troubleshooting = logs + trends + correlation + escalation.

Advanced and scenario questions

These test whether you can reason about AutoSys in complex real-world situations. Senior-level interviews focus heavily on this section.

advanced_qa.txtBASH

Q: Design an AutoSys workflow for end-of-day batch processing.
A: Use a 3-level hierarchy: master box (overall schedule) → section boxes
   (logical groupings: extract, transform, load, report) → CMD jobs inside each
   section box. Include a pre-check job as box_terminator, n_retrys on I/O jobs,
   alarm_if_fail on all critical jobs, and a post-check job to validate output.

Q: What is box_terminator and when would you use it?
A: box_terminator: 1 on a job means if that job fails, the entire parent box
   immediately moves to FAILURE and all remaining inner jobs are skipped.
   Use it on validation/pre-check jobs whose failure makes all downstream
   processing pointless.

Q: How do you handle a scenario where an upstream file sometimes arrives late?
A: Use a File Watcher job (job_type: FW) with a run_window covering the expected
   arrival period and an appropriate min_file_size. The downstream jobs condition
   on success(file_watcher_job). This way processing starts as soon as the file
   arrives rather than at a fixed time that may be too early.

Q: How do you pass data between AutoSys jobs?
A: Using global variables: the upstream script runs sendevent -E SET_GLOBAL
   -G "VAR_NAME=value". Downstream jobs read it via autostatus -G VAR_NAME or
   reference it in JIL conditions with variable(VAR_NAME).

💡Senior interview tip — mention trade-offs and alternatives

When asked 'how would you design X', don't just give one answer. Say 'Option A is a box with a File Watcher. Option B is a scheduled job with polling. Option A is better because...' Show you can compare approaches.

📊 Production Insight

A senior candidate was asked 'How would you handle a file that arrives in multiple chunks?'

Junior answer: 'Use a File Watcher.'

Senior answer: 'Use a manifest file. Upstream writes one .ready file after all chunks are complete. File Watcher watches .ready. This prevents triggering on partial data. Alternatively, use min_file_size set to the expected final size, but manifest is more reliable because chunk order is unpredictable.'

The senior answer showed consideration of edge cases, alternatives, and trade-offs. That's what gets the offer.

Rule: At senior level, every answer should include 'it depends' and then explain the trade-offs.

🎯 Key Takeaway

EOD workflow: hierarchical boxes + pre-check terminator + post-check validation.

box_terminator on validation jobs only. Optional jobs should never be terminators.

File Watcher for unpredictable arrival times. Must have min_file_size and run_window.

Global variables pass data. Use workflow prefixes to avoid collisions.

AutoSys Event Server Failover — What the Docs Won't Tell You

You've designed a fault-tolerant architecture. Two Event Processors, one Event Server, automatic failover. Then the primary ES dies, and your jobs vanish into a black hole for twenty minutes.

The docs say failover takes 90 seconds. Reality says 5-15 minutes depending on deadlock detection timeouts, unreachable host timeouts, and whatever else your sysadmin configured in the alarm scripts. The Event Server failover timeout is NOT a hard limit. It's a TCP timeout stack. Every hop adds seconds.

Senior engineers set ES_FAILOVER_TIMEOUT to 300 seconds minimum. Anything lower causes false positives when the primary ES is just slow, not dead. And you MUST test failover manually during non-peak hours. Simulate a kill -9 on the ES process. Watch the client machines log. If they don't reconnect within 300 seconds, your network team has a problem.

Production lesson: Event Server redundancy gives you zero benefit if the failover doesn't complete before your SLA breach window. Test your recovery time objective (RTO) quarterly. Your auditor will ask for the proof. Have it ready.

eventServerFailoverTest.ymlYAML

// io.thecodeforge — devops tutorial
// Simulate ES failover and measure downtime
// Run on a non-production calendar day

event_server_primary:
  host: autosys-es-01.prod.corp
  process_name: eventor
  kill_command: "kill -9 $(pgrep -f eventor)"
  expected_failover_time_seconds: 300
  alert_slack_channel: "#ops-autosys-watchdog"

validation:
  check_every_seconds: 10
  max_retries: 30
  success_condition:
    - "Client machines reconnect to secondary ES"
    - "No scheduled job starts within 5 minutes of failover"
    - "System alarms trigger within 60 seconds"
  failure_action: "Page the on-call architect immediately"

post_test:
  verify_log: "/opt/autosys/logs/es_failover_2025-03-17.log"
  recovery_es_process_restart: true

Output

es_failover_2025-03-17.log

15:30:00 INFO Primary ES kill issued

15:30:02 WARN Client abc-app-01: ES unreachable

15:30:03 WARN Client xyz-db-02: ES unreachable

15:32:15 INFO Secondary ES elected as new primary

15:32:18 INFO Client abc-app-01 reconnected

15:32:19 INFO Client xyz-db-02 reconnected

Total failover time: 135 seconds

Status: PASS (threshold 300s)

⚠ Production Trap:

Most autosys teams never test failover. They assume it works. Then the SAN goes down on a Tuesday at 3 PM and your VP wants a root cause in 10 minutes. Run a failure drill quarterly. Document every delay.

🎯 Key Takeaway

Es failover timeouts are additive across network layers. Set ES_FAILOVER_TIMEOUT to 300s minimum. Test actual RTO quarterly, not just config.

Global Virtual Machines — Why Your 'Simple' Box Stops Running

Junior engineers love Global Virtual Machines (GVM). One config, applies everywhere. Then someone adds a new local machine to the same farm, and suddenly jobs stop starting on it for no reason.

GVM works by matching machine attributes — hostname prefix, OS version, location code, custom tags. If your GVM regex is too permissive, it matches machines you don't intend. If it's too strict, it excludes machines you do intend. The worst part: there's no built-in query to show you what machines a GVM resolves to. You have to write a Python script to parse the AUDIT_DB.

Here's what 90% of production incidents boil down to: someone updates the hostname naming convention (e.g., from 'app-[env]-[num]' to 'app-[num]-[env]') but forgets to update the GVM regex. Jobs start failing silently. No alarm. Just machines sitting idle while boxes stay pending.

Fix: Never use hostname prefix in GVM. Use custom machine attributes — location, environment, role — that are explicitly set and reviewed in change management. Then your GVM becomes a simple AND/OR filter on stable tags. Audit your GVMs quarterly against current machine inventory. Automate it or schedule a manual check in Jira.

gvmAuditScript.ymlYAML

// io.thecodeforge — devops tutorial
// Audit GVM definitions against live machine inventory
// Run weekly cron job. Output to Slack.

gvm_name: "APP_LINUX_PROD_BOX"
regex_pattern: "app-[A-Za-z]{2}-[0-9]{3}*"
expected_machines:
  - app-prd-001
  - app-prd-002
  - app-prd-003
  - app-prd-004
live_machines_with_prefix:
  - app-dev-001  # Wrong env — matches accidentally
  - app-prd-005  # Missing from GVM? Check if new

check_config:
  audit_db_job: "AUTOSYS_AUDIT_GVM_DEFINITIONS"
  alert_on_mismatch: true
  slack_channel: "#autosys-gvm-audit"

Output

GVM Audit: APP_LINUX_PROD_BOX

MATCHED (unexpected): app-dev-001, app-stg-004

MISSING (expected): app-prd-005 (newly provisioned)

Action: Update GVM regex to exclude non-prod. Add app-prd-005.

💡Senior Shortcut:

🎯 Key Takeaway

GVMs are fragile with hostname regex. Use custom machine attributes instead. Audit actual resolution weekly to catch drifts before jobs fail.

● Production incidentPOST-MORTEMseverity: high

The Interview Answer That Didn't Match Production

Symptom

The candidate answered: 'ON_ICE, because I want the job to wait until the next cycle after the migration.' That's technically correct. But the interviewer wanted to hear 'ON_HOLD, because after the migration finishes, we want the job to run immediately, not wait until midnight.'

Assumption

The candidate memorised definitions but never applied them to real operations. They didn't understand the operational consequence of the difference.

Root cause

ON_HOLD: release triggers immediate start if conditions are currently true. ON_ICE: release requires time conditions to reoccur in the next scheduling cycle. During a database migration at 2 PM, a job that normally runs at midnight is held. After migration completes at 4 PM: - If ON_HOLD: release runs the job at 4 PM (good — you want validation now) - If ON_ICE: release does nothing until midnight (bad — you wait 8 hours to validate)

Fix

The candidate learned the rule: ON_HOLD for temporary pauses during business hours where you want immediate resume. ON_ICE for permanent schedule changes or when you don't want out-of-cycle runs. Interview tip: Always follow definition with 'In production, I would use ON_HOLD when... and ON_ICE when...'

Key lesson

Memorised definitions are not enough. Apply them to real scenarios.
ON_HOLD = immediate resume. ON_ICE = next scheduled cycle.
Database migrations: ON_HOLD. Schedule changes: ON_ICE.
Interviewers probe with 'when would you use this?' — always have an example.

Production debug guideThe 'walk me through how you'd fix this' questions4 entries

Symptom · 01

Job in PEND_MACH status at 2 AM

→

Fix

Step 1: SSH to agent machine. df -h (full disk is #1 cause). Step 2: ps -ef | grep autosys (agent running?). Step 3: Check network: telnet server 7520. Answer: Most likely full disk stopping agent.

Symptom · 02

Job shows SUCCESS but data didn't update

→

Fix

Look for sqlplus without error checking. Check std_out_file for ORA- errors. Answer: sqlplus returns 0 on SQL errors. Always wrap in script that greps for ORA-.

Symptom · 03

File Watcher triggered on empty file

→

Fix

Check min_file_size. Default is 0. Increase to 1024+. Answer: Upstream wrote lock file first.

Symptom · 04

SAP job stuck PENDING, no error

→

Fix

XBP user password expired or account locked. Check with Basis team. Answer: AutoSys can't see SAP auth failures.

★ Interview Command Recall — Must-Know SyntaxYou will be asked these exact commands. Know them cold.

Back up all job definitions−

Immediate action

Use autorep with -q flag

Commands

autorep -J % -q > backup_$(date +%Y%m%d).jil

Fix now

This is a complete backup in JIL format

Check why job isn't starting+

Force-start a job+

Set a global variable+

AutoSys Interview Topics — What to Expect by Level

Topic area	Junior expected depth	Mid-level expected depth	Senior expected depth
Architecture	Name the components	Explain what each does, failure modes	Design HA, predict failure cascades
JIL commands	Basic insert/update/delete syntax	autorep flags, backup strategies	Complex JIL with conditions, variables
Status codes	Recognise SU/FA/RU/IN	PEND_MACH causes, ON_HOLD vs ON_ICE	Recovery procedures for each status
Scheduling	date_conditions, start_times	run_window, run_calendar	Complex calendars, timezone handling
Fault tolerance	n_retrys definition	box_terminator, term_run_time	HA design, recovery strategy
Troubleshooting	Check logs command	Systematic diagnosis workflow	Root cause analysis, prevention

⚙ Quick Reference

6 commands from this guide

File	Command / Code	Purpose
architecture_qa.txt	Q: What is AutoSys and what problem does it solve?	Architecture and concepts
jil_operations_qa.txt	Q: What is the difference between insert_job and update_job?	JIL and job operations
status_trouble_qa.txt	Q: What does PEND_MACH mean and what usually causes it?	Status codes and troubleshooting
advanced_qa.txt	Q: Design an AutoSys workflow for end-of-day batch processing.	Advanced and scenario questions
eventServerFailoverTest.yml	event_server_primary:	AutoSys Event Server Failover
gvmAuditScript.yml	gvm_name: "APP_LINUX_PROD_BOX"	Global Virtual Machines

Key takeaways

ON_HOLD vs ON_ICE

OFF_HOLD starts immediately. OFF_ICE waits for next schedule. This is tested in almost every interview.

autorep flags

default (status), -d (detail), -q (JIL dump), -s (filter), -run (last N runs). Know them cold.

PEND_MACH = agent unreachable. First check

disk space (df -h). 90% of cases.

date_conditions defaults to 0 (disabled). Most people assume it's 1. That's the trap.

FORCE_STARTJOB bypasses ALL conditions (time AND dependencies). STARTJOB respects them.

box_terminator

1 on validation only. Never on optional jobs.

Senior answers include trade-offs

'it depends' + comparison of approaches.

Symptom

Interviewer asks 'design an EOD batch workflow'. Candidate stalls or gives a flat list of jobs without hierarchy, error handling, or validation.

Fix

Have a pattern ready: master box → section boxes (extract/transform/load/report) → CMD jobs. Include pre-check validation as box_terminator. Include post-check verification. Mention n_retrys on network I/O jobs. This shows you've built real workflows.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is AutoSys and what makes it better than cron for enterprise batch ...

Q02SENIOR

Explain the AutoSys architecture and the role of each component.

Q03SENIOR

What is the difference between ON_HOLD and ON_ICE? What happens when you...

Q04SENIOR

A job is in PEND_MACH status. Walk me through how you diagnose and fix i...

Q05JUNIOR

What does date_conditions do and what is its default value?

Q06SENIOR

What is box_terminator and when would you use it?

Q07SENIOR

How do you design an AutoSys workflow for a complex end-of-day batch run...

Q08SENIOR

What is the difference between FORCE_STARTJOB and STARTJOB?

Q09SENIOR

How would you pass a record count from one AutoSys job to the next?

Q10SENIOR

Walk me through how you recover from a BOX job that went to FAILURE at 3...

Q01 of 10JUNIOR

What is AutoSys and what makes it better than cron for enterprise batch processing?

ANSWER

AutoSys is an enterprise workload automation platform. Better than cron because: cross-server dependencies (cron can't make job B wait for job A on another server), centralised monitoring (cron logs are per-server), alerting (cron only emails errors), audit trails (who changed what and when), retry logic (n_retrys), file-watching (event-driven), and global variables (cross-job data passing). Cron is fine for single-server, independent jobs. AutoSys is for multi-server, dependent workflows with SLAs and compliance requirements.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What AutoSys questions come up most in interviews?

What is the most commonly asked AutoSys interview question?

Do I need hands-on AutoSys experience to pass the interview?

What is the PEND_MACH answer in AutoSys interviews?

How do I explain AutoSys to a non-technical interviewer?

COMPLETE GUIDE

The Complete AutoSys Workload Automation Guide for Engineers →

JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.

✓ Verified

production tested

July 27, 2026

last updated

1,750

articles · all by Naren

🔥

That's AutoSys. Mark it forged?

3 min read · try the examples if you haven't