Skip to content
Home DevOps AutoSys Interview: 50 Questions With Answers

AutoSys Interview: 50 Questions With Answers

Where developers are forged. · Structured learning · Free forever.
📍 Part of: AutoSys → Topic 30 of 30
ON_HOLD vs ON_ICE.
⚙️ Intermediate — basic DevOps knowledge assumed
In this tutorial, you'll learn
ON_HOLD vs ON_ICE.
  • ON_HOLD vs ON_ICE: OFF_HOLD starts immediately. OFF_ICE waits for next schedule. This is tested in almost every interview.
  • autorep flags: default (status), -d (detail), -q (JIL dump), -s (filter), -run (last N runs). Know them cold.
  • PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
AutoSys Interview Topic Map AutoSys Interview Topic Map. Grouped by category — know these cold · Architecture · Event Server vs Processor · Component roles · HA / shadow server · PEND_MACH causesTHECODEFORGE.IOAutoSys Interview Topic MapGrouped by category — know these cold ArchitectureEvent Server vs ProcessorComponent rolesHA / shadow serverPEND_MACH causes JIL Commandsinsert vs updatedelete vs delete_boxautorep -q backupoverride_job Job TypesCMD / BOX / FW diffbox_name attributebox_terminator useFW min_file_size Status CodesAll abbreviationsON_HOLD vs ON_ICEACTIVATED meaningTERMINATED causes Schedulingdate_conditions gaterun_window purposerun_calendar setuptimezone handling TroubleshootingFailure workflowPEND_MACH → disk checkRestart procedureCHANGE_STATUS useTHECODEFORGE.IO
thecodeforge.io
AutoSys Interview Topic Map
Autosys Interview Questions
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • ON_HOLD: releasing starts job immediately if conditions met. ON_ICE: releasing waits for next scheduling cycle. Most common wrong answer.
  • PEND_MACH = agent unreachable. First check: disk space on agent (df -h). 90% of cases.
  • date_conditions defaults to 0 (time scheduling disabled). Most people assume it's 1.
  • FORCE_STARTJOB bypasses ALL conditions (time AND dependencies). STARTJOB respects them.
  • box_terminator: 1 stops entire box when job fails. Use on validation jobs only.
  • Global variables: SET_GLOBAL writes, autostatus -G reads, variable() in JIL conditions.
🚨 START HERE

Interview Command Recall — Must-Know Syntax

You will be asked these exact commands. Know them cold.
🟡

Back up all job definitions

Immediate ActionUse autorep with -q flag
Commands
autorep -J % -q > backup_$(date +%Y%m%d).jil
Fix NowThis is a complete backup in JIL format
🟡

Check why job isn't starting

Immediate ActionView JIL definition and status
Commands
autorep -J JOBNAME -q
autorep -J JOBNAME -d
Fix Now-q shows conditions, -d shows status detail
🟡

Force-start a job

Immediate ActionUse sendevent FORCE_STARTJOB
Commands
sendevent -E FORCE_STARTJOB -J JOBNAME
sendevent -E CHANGE_STATUS -J JOBNAME -s SUCCESS (to unblock downstream)
Fix NowFORCE_STARTJOB bypasses ALL conditions
🟡

Set a global variable

Immediate ActionUse sendevent SET_GLOBAL
Commands
sendevent -E SET_GLOBAL -G "COUNT=100"
autostatus -G COUNT
Fix NowNo spaces around =
Production Incident

The Interview Answer That Didn't Match Production

A candidate nailed the ON_HOLD vs ON_ICE definition. But when asked 'which one would you use to pause a job temporarily during a database migration?', they guessed wrong. Theory without application failed the interview.
SymptomThe candidate answered: 'ON_ICE, because I want the job to wait until the next cycle after the migration.' That's technically correct. But the interviewer wanted to hear 'ON_HOLD, because after the migration finishes, we want the job to run immediately, not wait until midnight.'
AssumptionThe candidate memorised definitions but never applied them to real operations. They didn't understand the operational consequence of the difference.
Root causeON_HOLD: release triggers immediate start if conditions are currently true. ON_ICE: release requires time conditions to reoccur in the next scheduling cycle. During a database migration at 2 PM, a job that normally runs at midnight is held. After migration completes at 4 PM: - If ON_HOLD: release runs the job at 4 PM (good — you want validation now) - If ON_ICE: release does nothing until midnight (bad — you wait 8 hours to validate)
FixThe candidate learned the rule: ON_HOLD for temporary pauses during business hours where you want immediate resume. ON_ICE for permanent schedule changes or when you don't want out-of-cycle runs. Interview tip: Always follow definition with 'In production, I would use ON_HOLD when... and ON_ICE when...'
Key Lesson
Memorised definitions are not enough. Apply them to real scenarios.ON_HOLD = immediate resume. ON_ICE = next scheduled cycle.Database migrations: ON_HOLD. Schedule changes: ON_ICE.Interviewers probe with 'when would you use this?' — always have an example.
Production Debug Guide

The 'walk me through how you'd fix this' questions

Job in PEND_MACH status at 2 AMStep 1: SSH to agent machine. df -h (full disk is #1 cause). Step 2: ps -ef | grep autosys (agent running?). Step 3: Check network: telnet server 7520. Answer: Most likely full disk stopping agent.
Job shows SUCCESS but data didn't updateLook for sqlplus without error checking. Check std_out_file for ORA- errors. Answer: sqlplus returns 0 on SQL errors. Always wrap in script that greps for ORA-.
File Watcher triggered on empty fileCheck min_file_size. Default is 0. Increase to 1024+. Answer: Upstream wrote lock file first.
SAP job stuck PENDING, no errorXBP user password expired or account locked. Check with Basis team. Answer: AutoSys can't see SAP auth failures.

AutoSys interviews are specific. Interviewers know the tool. Vague answers about 'scheduling jobs' fail.

This guide assumes you've worked through the other articles in this track. It's your review. The questions are organised from foundational to advanced. The answers are complete, not truncated.

The most common wrong answer? ON_HOLD vs ON_ICE. That question appears in almost every interview. Get it right.

Architecture and concepts

These questions test whether you understand what AutoSys actually is and how it works internally. They're usually early in the interview to establish baseline knowledge.

architecture_qa.txt · BASH
123456789101112131415
Q: What is AutoSys and what problem does it solve?
A: AutoSys is Broadcom's enterprise workload automation platform for scheduling,
   monitoring, and orchestrating batch jobs across multiple servers. It solves the
   scalability problems of cron: dependency management, centralised visibility, alerting,
   audit trails, and multi-server coordination.

Q: What are the main components of AutoSys architecture?
A: Event Server (database storing all definitions and events), Event Processor
   (scheduling daemon that evaluates conditions and triggers agents), Remote Agents
   (lightweight processes on each target machine), and Clients (CLI tools + WCC web UI).

Q: What happens when the Event Processor goes down?
A: Job triggering stops. Jobs that are currently RUNNING continue to completion (the
   agent handles execution independently), but no new jobs will be triggered until
   the Event Processor is restarted.
🔥Interview tip — Event Processor vs Event Server
Interviewers often ask 'what's the difference?' The Event Server is the database (storage). The Event Processor is the daemon (evaluation). One stores state, the other triggers jobs.
📊 Production Insight
A candidate answered 'The Event Processor writes to the Event Server.' That's backwards. The Event Processor reads from the Event Server. The Event Server is written to by agents and sendevent commands. The processor is stateless.
The interviewer asked a follow-up: 'If the Event Server goes down, do running jobs continue?' The candidate didn't know. Answer: Yes — the agent runs jobs independently. But job completion status cannot be written back.
Rule: Know which component does what. If you confuse direction, you fail the architecture section.
🎯 Key Takeaway
Event Server = database (storage). Event Processor = daemon (evaluation).
Event Processor down = no new jobs. Running jobs continue.
Agent down = jobs on that PEND_MACH. Other agents fine.
Know the failure modes: silent, not sudden.
Component failure — what happens to jobs?
IfEvent Processor crashes
UseRunning jobs continue. No new jobs start. Status updates queue.
IfEvent Server unreachable
UseRunning jobs continue. Completion status can't be saved. Agent may retry.
IfRemote Agent on machine down
UseJobs on that machine stay PENDING. Other machines unaffected.
IfNetwork between server and agent down
UseJobs on that machine go PEND_MACH. Agent can't start jobs or report status.

JIL and job operations

These test practical JIL knowledge — what interviewers really want to know is whether you've actually used the tool, not just read about it.

jil_operations_qa.txt · BASH
1234567891011121314151617181920
Q: What is the difference between insert_job and update_job?
A: insert_job creates a new job definition — fails if job already exists.
   update_job modifies an existing job (partial update, only changed attributes).
   Fails if job doesn't exist.

Q: What is the difference between delete_job and delete_box?
A: delete_job on a box removes only the box, leaving inner jobs as standalone.
   delete_box removes the box AND all its inner jobs.

Q: How do you back up AutoSys job definitions?
A: autorep -J % -q > backup_$(date +%Y%m%d).jil
   This dumps all job definitions in JIL format to a file.

Q: How do you view the JIL definition of an existing job?
A: autorep -J jobname -q

Q: What does FORCE_STARTJOB do differently from STARTJOB?
A: FORCE_STARTJOB starts the job immediately bypassing all conditions
   (date_conditions, start_times, condition attribute). STARTJOB only triggers
   if conditions are currently met.
⚠ Most missed JIL question: delete_job vs delete_box
On a box: delete_job removes the box container. Inner jobs become standalone. delete_box removes box AND inner jobs. This is a common trick question — if you say 'delete_job removes the box and its jobs', you're wrong.
📊 Production Insight
An operations engineer used delete_job on a production box thinking it would remove all inner jobs. It didn't. The box vanished. All inner jobs became orphaned standalone jobs. They continued running on their own schedules, independent of dependencies.
A trading settlement job ran 4 hours early because its parent box was gone. The box had enforced a start time. Without the box, the job ran at its own start time — which was 2 PM, not 6 PM.
Recovery: regenerate box definition from backup (autorep -J boxname -q had been saved). Reinsert box. Reassociate inner jobs with box_name attributes.
Rule: Always have current JIL backups. autorep -J % -q weekly. Delete box? Use delete_box or expect orphaned jobs.
🎯 Key Takeaway
insert_job vs update_job: create vs modify. delete_job vs delete_box: box-only vs box+children.
Backups: autorep -J % -q > backup.jil — do this weekly.
FORCE_STARTJOB bypasses ALL conditions. STARTJOB respects them.
JIL is case-sensitive on Linux. JOB vs Job are different.

Status codes and troubleshooting

These test operational knowledge — have you actually been on-call for an AutoSys environment? Interviewers love status code questions because they separate theory from practice.

status_trouble_qa.txt · BASH
1234567891011121314151617181920
Q: What does PEND_MACH mean and what usually causes it?
A: PEND_MACH (PE) means the Remote Agent on the target machine is unavailable.
   Most common cause: the agent machine's filesystem is 100% full, stopping the
   agent service. Check disk space first: ssh machine01 'df -h'

Q: What is the difference between ON_HOLD and ON_ICE?
A: ON_HOLD: releasing (OFF_HOLD) starts the job immediately if conditions are currently met.
   ON_ICE: releasing (OFF_ICE) makes the job wait for conditions to reoccur in the
   next scheduling cycle — it does not start immediately.

Q: A job was failing every night for a week. What's your troubleshooting approach?
A: 1. Check std_err_file for the error pattern
   2. Check if it's always the same exit code (consistent root cause)
   3. Check autorep -J jobname -run 7 to compare recent runs
   4. Check if it correlates with system events (deployments, maintenance)
   5. Engage the application team who owns the script

Q: How do you unblock downstream jobs after manually fixing a failed job?
A: sendevent -E CHANGE_STATUS -J fixed_job -s SUCCESS
   This marks the job SUCCESS so all downstream success() conditions are met.
⚠ ON_HOLD vs ON_ICE — the most common wrong answer
Most candidates say 'they're the same'. They're not. OFF_HOLD starts immediately. OFF_ICE waits for next schedule. If you get this wrong, you fail the status section. Know it cold.
📊 Production Insight
A candidate correctly defined ON_HOLD vs ON_ICE. Then the interviewer asked: 'You have a job that runs at midnight. At 2 PM, you put it ON_ICE. At 3 PM, you release it. When does it run?'
The candidate thought: immediately. Wrong. ON_ICE release waits for the next scheduling cycle — midnight. The job ran at midnight, not 3 PM.
The candidate would have failed the real scenario. Operational experience matters more than definitions.
Rule: ON_HOLD = manual overrides during the day. ON_ICE = permanent schedule changes or avoiding out-of-cycle runs.
🎯 Key Takeaway
PEND_MACH = agent unreachable. First check: disk space.
ON_HOLD = immediate resume. ON_ICE = next scheduled cycle. Learn it.
CHANGE_STATUS -s SUCCESS unblocks downstream after manual fix.
troubleshooting = logs + trends + correlation + escalation.

Advanced and scenario questions

These test whether you can reason about AutoSys in complex real-world situations. Senior-level interviews focus heavily on this section.

advanced_qa.txt · BASH
12345678910111213141516171819202122
Q: Design an AutoSys workflow for end-of-day batch processing.
A: Use a 3-level hierarchy: master box (overall schedule) → section boxes
   (logical groupings: extract, transform, load, report) → CMD jobs inside each
   section box. Include a pre-check job as box_terminator, n_retrys on I/O jobs,
   alarm_if_fail on all critical jobs, and a post-check job to validate output.

Q: What is box_terminator and when would you use it?
A: box_terminator: 1 on a job means if that job fails, the entire parent box
   immediately moves to FAILURE and all remaining inner jobs are skipped.
   Use it on validation/pre-check jobs whose failure makes all downstream
   processing pointless.

Q: How do you handle a scenario where an upstream file sometimes arrives late?
A: Use a File Watcher job (job_type: FW) with a run_window covering the expected
   arrival period and an appropriate min_file_size. The downstream jobs condition
   on success(file_watcher_job). This way processing starts as soon as the file
   arrives rather than at a fixed time that may be too early.

Q: How do you pass data between AutoSys jobs?
A: Using global variables: the upstream script runs sendevent -E SET_GLOBAL
   -G "VAR_NAME=value". Downstream jobs read it via autostatus -G VAR_NAME or
   reference it in JIL conditions with variable(VAR_NAME).
💡Senior interview tip — mention trade-offs and alternatives
When asked 'how would you design X', don't just give one answer. Say 'Option A is a box with a File Watcher. Option B is a scheduled job with polling. Option A is better because...' Show you can compare approaches.
📊 Production Insight
A senior candidate was asked 'How would you handle a file that arrives in multiple chunks?'
Junior answer: 'Use a File Watcher.'
Senior answer: 'Use a manifest file. Upstream writes one .ready file after all chunks are complete. File Watcher watches .ready. This prevents triggering on partial data. Alternatively, use min_file_size set to the expected final size, but manifest is more reliable because chunk order is unpredictable.'
The senior answer showed consideration of edge cases, alternatives, and trade-offs. That's what gets the offer.
Rule: At senior level, every answer should include 'it depends' and then explain the trade-offs.
🎯 Key Takeaway
EOD workflow: hierarchical boxes + pre-check terminator + post-check validation.
box_terminator on validation jobs only. Optional jobs should never be terminators.
File Watcher for unpredictable arrival times. Must have min_file_size and run_window.
Global variables pass data. Use workflow prefixes to avoid collisions.
🗂 AutoSys Interview Topics — What to Expect by Level
Junior vs Mid-level vs Senior: The depth changes
Topic areaJunior expected depthMid-level expected depthSenior expected depth
ArchitectureName the componentsExplain what each does, failure modesDesign HA, predict failure cascades
JIL commandsBasic insert/update/delete syntaxautorep flags, backup strategiesComplex JIL with conditions, variables
Status codesRecognise SU/FA/RU/INPEND_MACH causes, ON_HOLD vs ON_ICERecovery procedures for each status
Schedulingdate_conditions, start_timesrun_window, run_calendarComplex calendars, timezone handling
Fault tolerancen_retrys definitionbox_terminator, term_run_timeHA design, recovery strategy
TroubleshootingCheck logs commandSystematic diagnosis workflowRoot cause analysis, prevention

🎯 Key Takeaways

  • ON_HOLD vs ON_ICE: OFF_HOLD starts immediately. OFF_ICE waits for next schedule. This is tested in almost every interview.
  • autorep flags: default (status), -d (detail), -q (JIL dump), -s (filter), -run (last N runs). Know them cold.
  • PEND_MACH = agent unreachable. First check: disk space (df -h). 90% of cases.
  • date_conditions defaults to 0 (disabled). Most people assume it's 1. That's the trap.
  • FORCE_STARTJOB bypasses ALL conditions (time AND dependencies). STARTJOB respects them.
  • box_terminator: 1 on validation only. Never on optional jobs.
  • Senior answers include trade-offs: 'it depends' + comparison of approaches.
  • Have a real example ready for every concept: 'I used ON_HOLD when...'

⚠ Common Mistakes to Avoid

    Memorising answers without understanding the reasoning
    Symptom

    Candidate defines ON_HOLD vs ON_ICE perfectly. When asked 'which would you use during a database migration?', they guess wrong. Interviewer probes deeper and realises lack of operational experience.

    Fix

    For every definition, think of a production scenario where you would use it. Practice explaining both ON_HOLD and ON_ICE with real examples.

    Not knowing the autorep flags
    Symptom

    Candidate says 'I would check the job status' but can't name autorep flags. Interviewer asks 'what's the difference between autorep -d and -q?' Candidate doesn't know.

    Fix

    Memorise: autorep alone = status table. -d = detail (start/end times). -q = JIL dump (definition). -s = filter by status. -run = last N runs.

    Confusing ON_HOLD and ON_ICE
    Symptom

    Candidate says 'they're the same.' Most common wrong answer. Immediate negative signal.

    Fix

    Repeat: OFF_HOLD starts immediately if conditions met. OFF_ICE waits for next scheduling cycle. If you can't articulate the difference, you haven't operated AutoSys.

    Being vague about troubleshooting — 'I would check the logs'
    Symptom

    Candidate says 'I would check the logs' without specifying which logs or what to look for. Interviewer hears 'I've never actually done this'.

    Fix

    Specific answers: 'First I check $AUTOUSER/out/event_demon.$AUTOSERV for condition evaluation. Then I check the job's std_err_file. Then I ssh to the agent and check the application log.'

    Not having an end-of-day workflow design ready
    Symptom

    Interviewer asks 'design an EOD batch workflow'. Candidate stalls or gives a flat list of jobs without hierarchy, error handling, or validation.

    Fix

    Have a pattern ready: master box → section boxes (extract/transform/load/report) → CMD jobs. Include pre-check validation as box_terminator. Include post-check verification. Mention n_retrys on network I/O jobs. This shows you've built real workflows.

Interview Questions on This Topic

  • QWhat is AutoSys and what makes it better than cron for enterprise batch processing?JuniorReveal
    AutoSys is an enterprise workload automation platform. Better than cron because: cross-server dependencies (cron can't make job B wait for job A on another server), centralised monitoring (cron logs are per-server), alerting (cron only emails errors), audit trails (who changed what and when), retry logic (n_retrys), file-watching (event-driven), and global variables (cross-job data passing). Cron is fine for single-server, independent jobs. AutoSys is for multi-server, dependent workflows with SLAs and compliance requirements.
  • QExplain the AutoSys architecture and the role of each component.Mid-levelReveal
    Four main components: Event Server: Relational database (Oracle/Sybase). Stores job definitions, events, statuses, global variables, machine table. Persistent storage. Event Processor: Background daemon (eventor). Reads Event Server, evaluates conditions, triggers agents. Stateless — recovers state from database on restart. Remote Agent: Process on each target machine. Receives job start requests, executes commands, reports status back via Event Server. Clients: CLI tools (sendevent, autorep, jil) and WCC web UI. Data flow: Event Processor polls Event Server for events → conditions evaluate true → Event Processor sends start to Remote Agent → Agent executes → Agent writes status to Event Server → Event Processor evaluates downstream conditions.
  • QWhat is the difference between ON_HOLD and ON_ICE? What happens when you release each?Mid-levelReveal
    This is the most common AutoSys interview question. ON_HOLD: Job is prevented from running. When released (OFF_HOLD), the job starts IMMEDIATELY if its starting conditions are currently true. ON_ICE: Job is prevented from running AND its conditions are frozen. When released (OFF_ICE), the job does NOT start immediately. It waits for conditions to be re-evaluated in the next scheduling cycle. Operational example: A job runs at midnight. At 2 PM, you put it ON_ICE. At 3 PM, you release it. It will run at midnight — not at 3 PM. When to use: ON_HOLD for temporary pauses (database migration, system maintenance) where you want immediate resume. ON_ICE for permanent schedule changes or when you don't want out-of-cycle runs.
  • QA job is in PEND_MACH status. Walk me through how you diagnose and fix it.Mid-levelReveal
    PEND_MACH means the Remote Agent on the target machine is unreachable. Diagnosis steps: 1. SSH to the agent machine 2. df -h — #1 cause: full filesystem prevents agent from starting 3. ps -ef | grep autosys — check if agent process is running 4. From AutoSys server: telnet agent-host 7520 — check port connectivity 5. autoping -m agent-machine — AutoSys connectivity test Most common fixes: - Full disk: clean up logs, restart agent - Agent not running: start agent service - Network issue: coordinate with network team - Port blocked: open firewall port 7520 Prevention: Monitor agent disk space, agent process, and network connectivity. Set up alerting when PEND_MACH persists > 5 minutes.
  • QWhat does date_conditions do and what is its default value?JuniorReveal
    date_conditions is a binary attribute (0 or 1) that enables or disables time-based scheduling for a job. - date_conditions: 0 (default): No time schedule. Job only runs when triggered by conditions or manually. Ignores start_times, days_of_week, run_calendar. - date_conditions: 1: Time-based scheduling active. Job runs when time conditions are met AND dependency conditions are true. Common trap: Engineers add start_times to a job but forget to set date_conditions: 1. The job never runs on schedule because time scheduling is disabled by default. To fix: update_job: jobname then date_conditions: 1.
  • QWhat is box_terminator and when would you use it?Mid-levelReveal
    box_terminator: 1 on a job means if that job fails, the entire parent box immediately moves to FAILURE and all remaining pending inner jobs are skipped. Use cases: - Data validation jobs — if input is invalid, downstream processing would produce garbage - Prerequisite checks — required resources not available - Security or authentication jobs — no point continuing Anti-pattern: Never mark optional jobs as box_terminator. A reporting failure shouldn't stop data loading. Example: EOD box has validate_input as first job with box_terminator: 1. If validation fails, the box stops immediately. Nothing else in the box runs. The team is alerted. No time is wasted running jobs that would fail anyway. Multiple terminator note: Only the first failed terminator stops the box. If two jobs are terminators and the first succeeds but second fails, the box still stops on the second failure.
  • QHow do you design an AutoSys workflow for a complex end-of-day batch run?SeniorReveal
    Use a 3-level hierarchical design: Level 1 — Master Box: - Schedule: date_conditions: 1, start_times: "18:00" - Contains all EOD processing Level 2 — Section Boxes (logical groupings): - EXTRACT_BOX: external file pulls, FTP - TRANSFORM_BOX: data cleaning, validation - LOAD_BOX: database inserts, updates - REPORT_BOX: generate outputs, email Level 3 — CMD Jobs: Individual shell scripts, sqlplus calls, etc. Fault tolerance: - Pre-check job (validate input) as box_terminator: 1 - n_retrys: 2 on network I/O jobs (FTP, API calls) - term_run_time on all jobs (> 4 hours? kill) - alarm_if_fail: 1 on critical jobs - Post-check job (validate output, row counts) as final step Monitoring: - Global variable tracks row counts through pipeline - Email on workflow start and completion - Dashboard showing box status This design is what interviewers expect for senior roles.
  • QWhat is the difference between FORCE_STARTJOB and STARTJOB?Mid-levelReveal
    FORCE_STARTJOB: Starts job immediately, bypassing ALL starting conditions — including date_conditions, start_times, and condition dependencies. Use for emergency runs or schedule overrides. STARTJOB: Starts job only if its starting conditions are currently true. Respects time schedule AND dependency conditions. Rarely used because if conditions are true, the job would have started automatically. Risk with FORCE_STARTJOB: If the job has condition: success(upstream_job) and that job never ran or failed, force-starting it means processing may run on incomplete data. Always check conditions first: autorep -J jobname -q | grep condition. When to use each: - FORCE_STARTJOB: missed schedule, need output now, dependency was manually satisfied - STARTJOB: practically never — use FORCE_STARTJOB or RESTART instead
  • QHow would you pass a record count from one AutoSys job to the next?SeniorReveal
    Using global variables: Setting job (usually a CMD script): ``bash COUNT=$(wc -l < /data/processed.txt) sendevent -E SET_GLOBAL -G "JOB_RECORD_COUNT=${COUNT}" ` Reading in downstream JIL condition: `jil condition: variable(JOB_RECORD_COUNT) >= 1000 ` Reading from command line for debug: `bash autostatus -G JOB_RECORD_COUNT ` Best practices: - Use workflow prefix: TRADING_RECORD_COUNT not COUNT (avoids collisions) - Reset at workflow start: sendevent -E SET_GLOBAL -G "COUNT=0"` - Document all globals — they're instance-wide and persist forever Alternative: Write to a file and pass filename via global. CMD job reads file from known path. But globals are cleaner for simple values.
  • QWalk me through how you recover from a BOX job that went to FAILURE at 3 AM.SeniorReveal
    Systematic recovery procedure: Step 1: Find the root cause autorep -J BOXNAME -d — find which inner job failed autorep -J FAILED_JOB -L 10 — see error log Check std_err_file on the agent Step 2: Determine failure type - Transient (network timeout, temporary lock) → retry - Permanent (bad code, missing file, permission) → fix first Step 3: Fix the issue - Code fix: commit, deploy - Missing file: ask upstream team - Permission: engage sysadmin Step 4: Reset the box state Option A: Force-start the failed job only sendevent -E FORCE_STARTJOB -J FAILED_JOB Then condition(success) will trigger the rest of the box Option B: Restart the entire box sendevent -E RESTART -J BOXNAME Only works if box is FAILURE status Step 5: Monitor recovery autorep -J BOXNAME -d every 5 minutes Alert on-call if not recovered in 30 minutes Step 6: Post-mortem Document root cause and fix. Add monitoring to prevent recurrence. Adjust n_retrys or term_run_time if needed.

Frequently Asked Questions

What AutoSys questions come up most in interviews?

The most common questions in order: 1. ON_HOLD vs ON_ICE (appears in almost every interview) 2. PEND_MACH causes and resolution 3. date_conditions default and meaning 4. FORCE_STARTJOB vs STARTJOB 5. box_terminator definition and use case 6. Design an end-of-day batch workflow

If you can answer these six questions confidently, you'll pass most AutoSys interviews. The rest are details.

What is the most commonly asked AutoSys interview question?

The ON_HOLD vs ON_ICE question appears in nearly every AutoSys interview. The key answer: releasing from ON_HOLD starts the job immediately if conditions are currently met; releasing from ON_ICE makes the job wait for conditions to reoccur in the next scheduling cycle — it does not start immediately.

Example to solidify: A job runs at midnight. At 2 PM, you put it ON_ICE. At 3 PM, you release it. It runs at midnight, not at 3 PM.

Interviewers love this question because 60% of candidates get it wrong or give a vague answer.

Do I need hands-on AutoSys experience to pass the interview?

For operational roles (SRE, batch operations, production support), yes — interviewers ask specific command syntax and scenario questions that require real experience. Studying concepts is necessary but not sufficient.

For architecture or design roles (global notices say 'AutoSys knowledge preferred'), you may pass with strong conceptual understanding and transferable experience from other schedulers (Control-M, TWS).

If you don't have direct experience: setting up a trial environment (Broadcom offers developer licenses) or documenting your company's existing AutoSys setup (even just reading JIL definitions) is valuable.

What is the PEND_MACH answer in AutoSys interviews?

PEND_MACH means the Remote Agent on the target machine is unavailable.

Causes (in order of likelihood): 1. Full disk on agent machine — agent service stopped 2. Agent service not running 3. Machine offline 4. Network issue 5. Firewall blocking port 7520

Diagnosis: 1. SSH to agent: df -h 2. Check agent: ps -ef | grep autosys 3. Test port: telnet agent-host 7520

Fix
  • Full disk: sudo rm /tmp/autosys_logs/* (old logs), restart agent
  • Agent stopped: start agent service
  • Network: engage network team

Interview tip: Say 'disk space is the most common cause — check df -h first' to show operational experience.

How do I explain AutoSys to a non-technical interviewer?

AutoSys is an enterprise scheduling and orchestration tool. It runs thousands of batch jobs every night — like payroll, financial reconciliation, data processing — across hundreds of servers.

Think of it like a very sophisticated alarm clock for a company's servers. It wakes up programs at specific times, in a specific order, waits for them to finish, and immediately alerts the team if anything goes wrong. It can also trigger jobs when a file arrives (event-driven) rather than at a fixed time.

Large banks and insurance companies use it because they can't afford for payroll or trade settlement to fail or run out of order.

This explanation works for hiring managers from non-technical backgrounds (HR, staffing agencies).

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousAutoSys Cloud Workload Automation
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged