AutoSys Interview: 50 Questions With Answers
- ON_HOLD vs ON_ICE: OFF_HOLD starts immediately. OFF_ICE waits for next schedule. This is tested in almost every interview.
- autorep flags: default (status), -d (detail), -q (JIL dump), -s (filter), -run (last N runs). Know them cold.
- PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
- ON_HOLD: releasing starts job immediately if conditions met. ON_ICE: releasing waits for next scheduling cycle. Most common wrong answer.
- PEND_MACH = agent unreachable. First check: disk space on agent (df -h). 90% of cases.
- date_conditions defaults to 0 (time scheduling disabled). Most people assume it's 1.
- FORCE_STARTJOB bypasses ALL conditions (time AND dependencies). STARTJOB respects them.
- box_terminator: 1 stops entire box when job fails. Use on validation jobs only.
- Global variables: SET_GLOBAL writes, autostatus -G reads, variable() in JIL conditions.
Interview Command Recall — Must-Know Syntax
Back up all job definitions
autorep -J % -q > backup_$(date +%Y%m%d).jilCheck why job isn't starting
autorep -J JOBNAME -qautorep -J JOBNAME -dForce-start a job
sendevent -E FORCE_STARTJOB -J JOBNAMEsendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS (to unblock downstream)Set a global variable
sendevent -E SET_GLOBAL -G "COUNT=100"autostatus -G COUNTProduction Incident
Production Debug GuideThe 'walk me through how you'd fix this' questions
AutoSys interviews are specific. Interviewers know the tool. Vague answers about 'scheduling jobs' fail.
This guide assumes you've worked through the other articles in this track. It's your review. The questions are organised from foundational to advanced. The answers are complete, not truncated.
The most common wrong answer? ON_HOLD vs ON_ICE. That question appears in almost every interview. Get it right.
Architecture and concepts
These questions test whether you understand what AutoSys actually is and how it works internally. They're usually early in the interview to establish baseline knowledge.
Q: What is AutoSys and what problem does it solve? A: AutoSys is Broadcom's enterprise workload automation platform for scheduling, monitoring, and orchestrating batch jobs across multiple servers. It solves the scalability problems of cron: dependency management, centralised visibility, alerting, audit trails, and multi-server coordination. Q: What are the main components of AutoSys architecture? A: Event Server (database storing all definitions and events), Event Processor (scheduling daemon that evaluates conditions and triggers agents), Remote Agents (lightweight processes on each target machine), and Clients (CLI tools + WCC web UI). Q: What happens when the Event Processor goes down? A: Job triggering stops. Jobs that are currently RUNNING continue to completion (the agent handles execution independently), but no new jobs will be triggered until the Event Processor is restarted.
JIL and job operations
These test practical JIL knowledge — what interviewers really want to know is whether you've actually used the tool, not just read about it.
Q: What is the difference between insert_job and update_job? A: insert_job creates a new job definition — fails if job already exists. update_job modifies an existing job (partial update, only changed attributes). Fails if job doesn't exist. Q: What is the difference between delete_job and delete_box? A: delete_job on a box removes only the box, leaving inner jobs as standalone. delete_box removes the box AND all its inner jobs. Q: How do you back up AutoSys job definitions? A: autorep -J % -q > backup_$(date +%Y%m%d).jil This dumps all job definitions in JIL format to a file. Q: How do you view the JIL definition of an existing job? A: autorep -J jobname -q Q: What does FORCE_STARTJOB do differently from STARTJOB? A: FORCE_STARTJOB starts the job immediately bypassing all conditions (date_conditions, start_times, condition attribute). STARTJOB only triggers if conditions are currently met.
Status codes and troubleshooting
These test operational knowledge — have you actually been on-call for an AutoSys environment? Interviewers love status code questions because they separate theory from practice.
Q: What does PEND_MACH mean and what usually causes it? A: PEND_MACH (PE) means the Remote Agent on the target machine is unavailable. Most common cause: the agent machine's filesystem is 100% full, stopping the agent service. Check disk space first: ssh machine01 'df -h' Q: What is the difference between ON_HOLD and ON_ICE? A: ON_HOLD: releasing (OFF_HOLD) starts the job immediately if conditions are currently met. ON_ICE: releasing (OFF_ICE) makes the job wait for conditions to reoccur in the next scheduling cycle — it does not start immediately. Q: A job was failing every night for a week. What's your troubleshooting approach? A: 1. Check std_err_file for the error pattern 2. Check if it's always the same exit code (consistent root cause) 3. Check autorep -J jobname -run 7 to compare recent runs 4. Check if it correlates with system events (deployments, maintenance) 5. Engage the application team who owns the script Q: How do you unblock downstream jobs after manually fixing a failed job? A: sendevent -E CHANGE_STATUS -J fixed_job -s SUCCESS This marks the job SUCCESS so all downstream success() conditions are met.
Advanced and scenario questions
These test whether you can reason about AutoSys in complex real-world situations. Senior-level interviews focus heavily on this section.
Q: Design an AutoSys workflow for end-of-day batch processing. A: Use a 3-level hierarchy: master box (overall schedule) → section boxes (logical groupings: extract, transform, load, report) → CMD jobs inside each section box. Include a pre-check job as box_terminator, n_retrys on I/O jobs, alarm_if_fail on all critical jobs, and a post-check job to validate output. Q: What is box_terminator and when would you use it? A: box_terminator: 1 on a job means if that job fails, the entire parent box immediately moves to FAILURE and all remaining inner jobs are skipped. Use it on validation/pre-check jobs whose failure makes all downstream processing pointless. Q: How do you handle a scenario where an upstream file sometimes arrives late? A: Use a File Watcher job (job_type: FW) with a run_window covering the expected arrival period and an appropriate min_file_size. The downstream jobs condition on success(file_watcher_job). This way processing starts as soon as the file arrives rather than at a fixed time that may be too early. Q: How do you pass data between AutoSys jobs? A: Using global variables: the upstream script runs sendevent -E SET_GLOBAL -G "VAR_NAME=value". Downstream jobs read it via autostatus -G VAR_NAME or reference it in JIL conditions with variable(VAR_NAME).
| Topic area | Junior expected depth | Mid-level expected depth | Senior expected depth |
|---|---|---|---|
| Architecture | Name the components | Explain what each does, failure modes | Design HA, predict failure cascades |
| JIL commands | Basic insert/update/delete syntax | autorep flags, backup strategies | Complex JIL with conditions, variables |
| Status codes | Recognise SU/FA/RU/IN | PEND_MACH causes, ON_HOLD vs ON_ICE | Recovery procedures for each status |
| Scheduling | date_conditions, start_times | run_window, run_calendar | Complex calendars, timezone handling |
| Fault tolerance | n_retrys definition | box_terminator, term_run_time | HA design, recovery strategy |
| Troubleshooting | Check logs command | Systematic diagnosis workflow | Root cause analysis, prevention |
🎯 Key Takeaways
- ON_HOLD vs ON_ICE: OFF_HOLD starts immediately. OFF_ICE waits for next schedule. This is tested in almost every interview.
- autorep flags: default (status), -d (detail), -q (JIL dump), -s (filter), -run (last N runs). Know them cold.
- PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
- date_conditions defaults to 0 (disabled). Most people assume it's 1. That's the trap.
- FORCE_STARTJOB bypasses ALL conditions (time AND dependencies). STARTJOB respects them.
- box_terminator: 1 on validation only. Never on optional jobs.
- Senior answers include trade-offs: 'it depends' + comparison of approaches.
- Have a real example ready for every concept: 'I used ON_HOLD when...'
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is AutoSys and what makes it better than cron for enterprise batch processing?JuniorReveal
- QExplain the AutoSys architecture and the role of each component.Mid-levelReveal
- QWhat is the difference between ON_HOLD and ON_ICE? What happens when you release each?Mid-levelReveal
- QA job is in PEND_MACH status. Walk me through how you diagnose and fix it.Mid-levelReveal
- QWhat does date_conditions do and what is its default value?JuniorReveal
- QWhat is box_terminator and when would you use it?Mid-levelReveal
- QHow do you design an AutoSys workflow for a complex end-of-day batch run?SeniorReveal
- QWhat is the difference between FORCE_STARTJOB and STARTJOB?Mid-levelReveal
- QHow would you pass a record count from one AutoSys job to the next?SeniorReveal
- QWalk me through how you recover from a BOX job that went to FAILURE at 3 AM.SeniorReveal
Frequently Asked Questions
What AutoSys questions come up most in interviews?
The most common questions in order: 1. ON_HOLD vs ON_ICE (appears in almost every interview) 2. PEND_MACH causes and resolution 3. date_conditions default and meaning 4. FORCE_STARTJOB vs STARTJOB 5. box_terminator definition and use case 6. Design an end-of-day batch workflow
If you can answer these six questions confidently, you'll pass most AutoSys interviews. The rest are details.
What is the most commonly asked AutoSys interview question?
The ON_HOLD vs ON_ICE question appears in nearly every AutoSys interview. The key answer: releasing from ON_HOLD starts the job immediately if conditions are currently met; releasing from ON_ICE makes the job wait for conditions to reoccur in the next scheduling cycle — it does not start immediately.
Example to solidify: A job runs at midnight. At 2 PM, you put it ON_ICE. At 3 PM, you release it. It runs at midnight, not at 3 PM.
Interviewers love this question because 60% of candidates get it wrong or give a vague answer.
Do I need hands-on AutoSys experience to pass the interview?
For operational roles (SRE, batch operations, production support), yes — interviewers ask specific command syntax and scenario questions that require real experience. Studying concepts is necessary but not sufficient.
For architecture or design roles (global notices say 'AutoSys knowledge preferred'), you may pass with strong conceptual understanding and transferable experience from other schedulers (Control-M, TWS).
If you don't have direct experience: setting up a trial environment (Broadcom offers developer licenses) or documenting your company's existing AutoSys setup (even just reading JIL definitions) is valuable.
What is the PEND_MACH answer in AutoSys interviews?
PEND_MACH means the Remote Agent on the target machine is unavailable.
Causes (in order of likelihood): 1. Full disk on agent machine — agent service stopped 2. Agent service not running 3. Machine offline 4. Network issue 5. Firewall blocking port 7520
Diagnosis: 1. SSH to agent: df -h 2. Check agent: ps -ef | grep autosys 3. Test port: telnet agent-host 7520
- Full disk:
sudo rm /tmp/autosys_logs/*(old logs), restart agent - Agent stopped: start agent service
- Network: engage network team
Interview tip: Say 'disk space is the most common cause — check df -h first' to show operational experience.
How do I explain AutoSys to a non-technical interviewer?
AutoSys is an enterprise scheduling and orchestration tool. It runs thousands of batch jobs every night — like payroll, financial reconciliation, data processing — across hundreds of servers.
Think of it like a very sophisticated alarm clock for a company's servers. It wakes up programs at specific times, in a specific order, waits for them to finish, and immediately alerts the team if anything goes wrong. It can also trigger jobs when a file arrives (event-driven) rather than at a fixed time.
Large banks and insurance companies use it because they can't afford for payroll or trade settlement to fail or run out of order.
This explanation works for hiring managers from non-technical backgrounds (HR, staffing agencies).
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.