AutoSys Alarms: The 1 Setting That Silences Critical Failures
- Set alarm_if_fail: 1 on all critical jobs — the default is 0 (no alarm). Run annual audits to ensure compliance.
- Use notification_emailaddress for direct email alerts; include log file paths and exit code (%x) in notification_msg.
- max_run_alarm and min_run_alarm provide bounds-based alerting for jobs running too long or suspiciously fast.
- AutoSys alarms raise alerts in Event Server on job failures, long-running jobs, or machine issues — alarm_if_fail: 1 enables failure alarms
- Key components: alarm_if_fail (job failure), max_run_alarm (long runtime), min_run_alarm (suspiciously fast), alarm_if_terminated (killed jobs), alarm_on_missing (machine offline)
- Performance impact: Alarms stored in Event Server DB; excessive alarms cause database bloat and UI slowdowns — aggregate, don't alert per instance
- Production trap: alarm_if_fail: 0 is default — a silent failure overnight means no one knows until customers complain
- Biggest mistake: Sending failure emails to unmonitored shared mailbox — alarms without response process are no alarms at all
AutoSys Alarm Debug Cheat Sheet
Job failed, no alarm — suspected missing alarm_if_fail
autorep -J job_name -q | grep -i alarm_if_failecho 'Default is 0 (no alarm). Must set to 1.'Email notification not sent on failure
autorep -J job_name -q | grep -i notification_emailautorep -M | grep -i mailmax_run_alarm firing too often — false positives
autorep -J job_name -r -t | tail -20 | grep -E 'Start|End|Run Time'echo 'Set max_run_alarm to p99 runtime + 20% buffer'Alarm acknowledged but same alarm reappears next day
autorep -a -J job_name | grep -i alarmecho 'Fix root cause, not just acknowledge'Machine offline — alarm_on_missing not configured
autorep -M prod-server-01 | grep -i alarmping prod-server-01 && echo 'Machine up' || echo 'Machine down'Production Incident
alarm_if_fail: 1. It was omitted entirely, and the default is 0 (no alarm). The team had configured email notifications for successful completion but not for failures. The operations team monitored the dashboard only during business hours. The failure alert was never triggered, and the on-call engineer had no way of knowing about the failure. The job was critical but treated as non-critical in the alarm configuration because no one had reviewed the JIL defaults.alarm_if_fail: 1.
2. Added notification_emailaddress: payroll-ops@company.com and notification_msg_on_failure: "Job %s failed on %m at %t with exit code %x. Check log /logs/autosys/payroll_run.err".
3. Added a separate max_run_alarm: 60 to detect hung jobs.
4. Configured the on-call rotation to include weekend coverage with pager duty integration.
5. Added a weekly audit script that lists all jobs with alarm_if_fail: 0 and flags them for review.Production Debug GuideSymptom → Action mapping for common alarm failures in AutoSys environments.
update_job: job_name alarm_if_fail: 1. Also check notification_emailaddress and notification_msg.autorep -M for mailer status. Also check if notification_emailaddress contains spaces or invalid characters. Test with sendevent -E ALARM_TEST.AutoSys has a built-in alarm system that lets you define exactly what events should trigger alerts and who should be notified. Without alarms, your batch jobs could silently fail overnight and nobody would know until users start reporting missing reports in the morning. With well-configured alarms, your team knows within minutes.
But alarms are dangerous. Set them too broadly and your team ignores them (alarm fatigue). Send them to the wrong mailbox and nobody reads them. Default alarm_if_fail: 0 means your critically important job fails every night at 2am and nobody ever hears about it.
By the end you'll know how to set up job failure alarms, email notifications, machine monitoring, and runtime bounds alerts. You'll also know the specific mistakes that cause alarms to be ignored or to never fire at all.
alarm_if_fail — the basic failure alert
The simplest alarm is alarm_if_fail. Set it to 1 on any job, and AutoSys raises an alarm in the Event Server when that job fails. You can then configure alarm actions to send email, page, or invoke a script.
insert_job: critical_eod_job job_type: CMD command: /scripts/critical_eod.sh machine: prod-server-01 owner: batchuser date_conditions: 1 days_of_week: all start_times: "22:00" alarm_if_fail: 1 /* raise alarm if job fails */ max_run_alarm: 60 /* also alarm if still running after 60 minutes */ min_run_alarm: 5 /* alarm if completes in less than 5 minutes (suspicious) */ alarm_if_terminated: 1 /* alarm if job is killed/terminated */
autorep -J % -q | grep -B5 alarm_if_fail | grep -v alarm_if_fail: 1 to find jobs without failure alarms.Notification attributes — email on failure
For direct email notification without configuring a separate alarm action, use the notification attributes. These send an email when the job fails.
insert_job: payroll_run job_type: CMD command: /scripts/payroll.sh machine: finance-server owner: finuser date_conditions: 1 days_of_week: fri start_times: "18:00" alarm_if_fail: 1 /* Email notification on failure */ notification_emailaddress: batch-ops@company.com,payroll-lead@company.com notification_emailaddress_on_success: payroll-mgr@company.com notification_msg: "ALERT: AutoSys job %s FAILED on %m at %t. Exit code %x. Check log: /logs/autosys/payroll_run.err" notification_msg_on_success: "INFO: Payroll run %s completed successfully at %t"
Machine and system alarms
Beyond job-level alarms, AutoSys can alarm on machine events — when an agent goes MISSING or when the Event Processor has issues.
/* Configure machine-level alarms */ update_machine: prod-server-01 max_load: 100 alarm_on_missing: 1 /* alarm when agent goes offline */ /* View active alarms */ # autorep -a /* show all active alarms */ # sendevent -E ALARM_ACK /* acknowledge an alarm */
update_machine: new_host alarm_on_missing: 1.| Alarm Type | Attribute | Triggers When | Default Value | Best For |
|---|---|---|---|---|
| Job failure alarm | alarm_if_fail: 1 | Job exits with non-zero code | 0 (disabled) | All production jobs |
| Long run alarm | max_run_alarm: N | Job still running after N minutes | 0 (disabled) | Jobs that can hang (file waits, network calls) |
| Short run alarm | min_run_alarm: N | Job completes in less than N minutes | 0 (disabled) | Jobs with expected minimum runtime (data loads) |
| Termination alarm | alarm_if_terminated: 1 | Job is killed (KILLJOB or term_run_time) | 0 (disabled) | Jobs that should never be killed manually |
| Machine offline alarm | alarm_on_missing: 1 | Agent machine stops responding | 0 (disabled) | All machines hosting critical jobs |
🎯 Key Takeaways
- Set alarm_if_fail: 1 on all critical jobs — the default is 0 (no alarm). Run annual audits to ensure compliance.
- Use notification_emailaddress for direct email alerts; include log file paths and exit code (%x) in notification_msg.
- max_run_alarm and min_run_alarm provide bounds-based alerting for jobs running too long or suspiciously fast.
- Set alarm_on_missing: 1 on all production machines — a missing agent takes down all jobs on that host.
- Alarms need a response process — sending to a shared mailbox nobody monitors defeats the purpose.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QHow do you configure AutoSys to send an email when a job fails?Mid-levelReveal
- QWhat does max_run_alarm do?JuniorReveal
- QWhat variables can you use in AutoSys notification_msg?JuniorReveal
- QWhat does alarm_if_fail: 0 mean (the default)?JuniorReveal
- QHow do you acknowledge an AutoSys alarm?JuniorReveal
Frequently Asked Questions
How do I get notified when an AutoSys job fails?
Set `alarm_if_fail: 1 and add notification_emailaddress: your-team@company.com to the job definition. Include a notification_msg` with the log file path and %x exit code so on-call engineers know where to look.
What is max_run_alarm in AutoSys?
max_run_alarm specifies a runtime threshold in minutes. If the job is still running after that many minutes, AutoSys raises an alarm. It doesn't kill the job (that's term_run_time), it just alerts the team that the job is taking longer than expected.
What are the notification message variables in AutoSys?
AutoSys supports: %s (job name), %m (machine name), %t (timestamp), %x (exit code). Use these in notification_msg and notification_msg_on_success to make alert emails immediately informative.
What is the default value of alarm_if_fail?
The default is 0, which means no alarm is raised on failure. You must explicitly set alarm_if_fail: 1 on jobs where you want failure alerts. Many teams make this a required attribute in their job definition standards.
How do I acknowledge an AutoSys alarm?
Use sendevent -E ALARM_ACK or acknowledge through the WCC interface. Unacknowledged alarms accumulate in the alarm list. Establishing an alarm acknowledgement process is important for keeping the alarm list meaningful.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.