Home DevOps AutoSys Alarms and Notifications — Alerting When Jobs Fail

AutoSys Alarms and Notifications — Alerting When Jobs Fail

Where developers are forged. · Structured learning · Free forever.
📍 Part of: AutoSys → Topic 24 of 30
Set up AutoSys alarms and email notifications for job failures, long-running jobs, and machine issues.
⚙️ Intermediate — basic DevOps knowledge assumed
In this tutorial, you'll learn:
  • Set alarm_if_fail: 1 on all critical jobs — the default is 0 (no alarm)
  • Use notification_emailaddress for direct email alerts; include log file paths in notification_msg
  • max_run_alarm and min_run_alarm provide bounds-based alerting for jobs running too long or suspiciously fast
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚡ Quick Answer
AutoSys alarms are the smoke detectors of your batch environment. When something goes wrong — a job fails, a machine goes offline, a job runs for too long — the alarm fires and the right people get notified before the problem becomes a crisis.

AutoSys has a built-in alarm system that lets you define exactly what events should trigger alerts and who should be notified. Without alarms, your batch jobs could silently fail overnight and nobody would know until users start reporting missing reports in the morning. With well-configured alarms, your team knows within minutes.

alarm_if_fail — the basic failure alert

The simplest alarm is alarm_if_fail. Set it to 1 on any job, and AutoSys raises an alarm in the Event Server when that job fails. You can then configure alarm actions to send email, page, or invoke a script.

alarm_basic.jil · BASH
123456789101112
insert_job: critical_eod_job
job_type: CMD
command: /scripts/critical_eod.sh
machine: prod-server-01
owner: batchuser
date_conditions: 1
days_of_week: all
start_times: "22:00"
alarm_if_fail: 1          /* raise alarm if job fails */
max_run_alarm: 60         /* also alarm if still running after 60 minutes */
min_run_alarm: 5          /* alarm if completes in less than 5 minutes (suspicious) */
alarm_if_terminated: 1    /* alarm if job is killed/terminated */

Notification attributes — email on failure

For direct email notification without configuring a separate alarm action, use the notification attributes. These send an email when the job fails.

notifications.jil · BASH
1234567891011121314
insert_job: payroll_run
job_type: CMD
command: /scripts/payroll.sh
machine: finance-server
owner: finuser
date_conditions: 1
days_of_week: fri
start_times: "18:00"
alarm_if_fail: 1
/* Email notification on failure */
notification_emailaddress: batch-ops@company.com,payroll-lead@company.com
notification_emailaddress_on_success: payroll-mgr@company.com
notification_msg: "ALERT: AutoSys job %s FAILED on %m at %t. Check log: /logs/autosys/payroll_run.err"
notification_msg_on_success: "INFO: Payroll run %s completed successfully at %t"
▶ Output
/* On failure: email sent to batch-ops@ and payroll-lead@ */
🔥
Notification message variablesAutoSys supports variables in notification_msg: %s = job name, %m = machine name, %t = timestamp, %x = exit code. Use these to make your alert emails informative enough that the on-call engineer knows what failed and where to look.

Machine and system alarms

Beyond job-level alarms, AutoSys can alarm on machine events — when an agent goes MISSING or when the Event Processor has issues.

machine_alarms.jil · BASH
12345678
/* Configure machine-level alarms */
update_machine: prod-server-01
max_load: 100
alarm_on_missing: 1    /* alarm when agent goes offline */

/* View active alarms */
# autorep -a              /* show all active alarms */
# sendevent -E ALARM_ACK  /* acknowledge an alarm */
Alarm typeAttributeTriggers when
Job failure alarmalarm_if_fail: 1Job exits with non-zero code
Long run alarmmax_run_alarm: NJob still running after N minutes
Short run alarmmin_run_alarm: NJob completes in less than N minutes
Termination alarmalarm_if_terminated: 1Job is killed (KILLJOB or term_run_time)
Machine offline alarmalarm_on_missing: 1Agent machine stops responding

🎯 Key Takeaways

  • Set alarm_if_fail: 1 on all critical jobs — the default is 0 (no alarm)
  • Use notification_emailaddress for direct email alerts; include log file paths in notification_msg
  • max_run_alarm and min_run_alarm provide bounds-based alerting for jobs running too long or suspiciously fast
  • Alarms need a response process — sending to a shared mailbox nobody monitors defeats the purpose

⚠ Common Mistakes to Avoid

  • Not setting alarm_if_fail: 1 on critical jobs — the default is 0, meaning no alert
  • Not including %x (exit code) and the log path in notification_msg — on-call engineers need to know where to look immediately
  • Setting max_run_alarm without investigating when it fires — alarm fatigue sets in if it triggers every day without action
  • Sending failure emails to a shared mailbox nobody actively monitors — alarms need a response process

Interview Questions on This Topic

  • QHow do you configure AutoSys to send an email when a job fails?
  • QWhat does max_run_alarm do?
  • QWhat variables can you use in AutoSys notification_msg?
  • QWhat does alarm_if_fail: 0 mean (the default)?
  • QHow do you acknowledge an AutoSys alarm?

Frequently Asked Questions

How do I get notified when an AutoSys job fails?

Set alarm_if_fail: 1 and add notification_emailaddress: your-team@company.com to the job definition. Include a notification_msg with the log file path so on-call engineers know where to look.

What is max_run_alarm in AutoSys?

max_run_alarm specifies a runtime threshold in minutes. If the job is still running after that many minutes, AutoSys raises an alarm. It doesn't kill the job (that's term_run_time), it just alerts the team that the job is taking longer than expected.

What are the notification message variables in AutoSys?

AutoSys supports: %s (job name), %m (machine name), %t (timestamp), %x (exit code). Use these in notification_msg and notification_msg_on_success to make alert emails immediately informative.

What is the default value of alarm_if_fail?

The default is 0, which means no alarm is raised on failure. You must explicitly set alarm_if_fail: 1 on jobs where you want failure alerts. Many teams make this a required attribute in their job definition standards.

How do I acknowledge an AutoSys alarm?

Use sendevent -E ALARM_ACK or acknowledge through the WCC interface. Unacknowledged alarms accumulate in the alarm list. Establishing an alarm acknowledgement process is important for keeping the alarm list meaningful.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousAutoSys Monitoring with WCCNext →AutoSys Fault Tolerance and Recovery
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged