AutoSys alarm_if_fail Defaults to 0 — Silent Failures
alarm_if_fail defaults to 0 in AutoSys JIL.
- AutoSys alarms raise alerts in Event Server on job failures, long-running jobs, or machine issues — alarm_if_fail: 1 enables failure alarms
- Key components: alarm_if_fail (job failure), max_run_alarm (long runtime), min_run_alarm (suspiciously fast), alarm_if_terminated (killed jobs), alarm_on_missing (machine offline)
- Performance impact: Alarms stored in Event Server DB; excessive alarms cause database bloat and UI slowdowns — aggregate, don't alert per instance
- Production trap: alarm_if_fail: 0 is default — a silent failure overnight means no one knows until customers complain
- Biggest mistake: Sending failure emails to unmonitored shared mailbox — alarms without response process are no alarms at all
AutoSys alarms are the smoke detectors of your batch environment. When something goes wrong — a job fails, a machine goes offline, a job runs for too long — the alarm fires and the right people get notified before the problem becomes a crisis.
AutoSys has a built-in alarm system that lets you define exactly what events should trigger alerts and who should be notified. Without alarms, your batch jobs could silently fail overnight and nobody would know until users start reporting missing reports in the morning. With well-configured alarms, your team knows within minutes.
But alarms are dangerous. Set them too broadly and your team ignores them (alarm fatigue). Send them to the wrong mailbox and nobody reads them. Default alarm_if_fail: 0 means your critically important job fails every night at 2am and nobody ever hears about it.
By the end you'll know how to set up job failure alarms, email notifications, machine monitoring, and runtime bounds alerts. You'll also know the specific mistakes that cause alarms to be ignored or to never fire at all.
alarm_if_fail — the basic failure alert
The simplest alarm is alarm_if_fail. Set it to 1 on any job, and AutoSys raises an alarm in the Event Server when that job fails. You can then configure alarm actions to send email, page, or invoke a script.
autorep -J % -q | grep -B5 alarm_if_fail | grep -v alarm_if_fail: 1 to find jobs without failure alarms.Notification attributes — email on failure
For direct email notification without configuring a separate alarm action, use the notification attributes. These send an email when the job fails.
Machine and system alarms
Beyond job-level alarms, AutoSys can alarm on machine events — when an agent goes MISSING or when the Event Processor has issues.
update_machine: new_host alarm_on_missing: 1.The Silent Friday Night Payroll Failure
alarm_if_fail: 1. It was omitted entirely, and the default is 0 (no alarm). The team had configured email notifications for successful completion but not for failures. The operations team monitored the dashboard only during business hours. The failure alert was never triggered, and the on-call engineer had no way of knowing about the failure. The job was critical but treated as non-critical in the alarm configuration because no one had reviewed the JIL defaults.alarm_if_fail: 1.
2. Added notification_emailaddress: payroll-ops@company.com and notification_msg_on_failure: "Job %s failed on %m at %t with exit code %x. Check log /logs/autosys/payroll_run.err".
3. Added a separate max_run_alarm: 60 to detect hung jobs.
4. Configured the on-call rotation to include weekend coverage with pager duty integration.
5. Added a weekly audit script that lists all jobs with alarm_if_fail: 0 and flags them for review.- alarm_if_fail: 0 is the default. You must explicitly set it to 1 on every critical job. Do not assume AutoSys alerts on failure.
- A failure without an alarm is a silent outage. Review all JIL files annually to ensure critical jobs have alarms enabled.
- Dashboards are not alarms. If no one is looking at the dashboard when the failure occurs, it's not monitoring.
- Document the on-call escalation process. The failure alert must reach a human, not just a log file.
update_job: job_name alarm_if_fail: 1. Also check notification_emailaddress and notification_msg.autorep -M for mailer status. Also check if notification_emailaddress contains spaces or invalid characters. Test with sendevent -E ALARM_TEST.update_job: job_name alarm_if_fail: 1 in JIL, then jil < update.jil or use sendevent -E UPDATE_JOB.Key takeaways
Common mistakes to avoid
5 patternsNot setting alarm_if_fail: 1 on critical jobs — expecting default to be 1
update_job: job_name alarm_if_fail: 1. Run quarterly audit: autorep -J % -q | grep -B5 'alarm_if_fail:' | grep -v 'alarm_if_fail: 1' to catch missing alarms.Not including %x (exit code) and log path in notification_msg
notification_msg: "Job %s failed on %m at %t with exit code %x. Log: /logs/autosys/%s.err". Include full absolute path to the log file.Sending alarms to unmonitored shared mailbox
Setting max_run_alarm too low — false positives every night
autorep -J job_name -r -t to see runtime history. For seasonal jobs, use multiple JILs with different thresholds or conditional start times.Acknowledging alarm without fixing root cause
Interview Questions on This Topic
How do you configure AutoSys to send an email when a job fails?
alarm_if_fail: 1 and configure notification_emailaddress and notification_msg. AutoSys sends email to the specified addresses when the job fails. The notification_msg can include variables %s (job name), %m (machine), %t (timestamp), %x (exit code). (2) Use alarm_action to call a custom script that sends email, pages, or creates a ticket. The alarm_action script receives the alarm details as arguments. The notification approach is simpler; alarm_action is more flexible for integration with ticketing systems or pager duty.Frequently Asked Questions
JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.
That's AutoSys. Mark it forged?
3 min read · try the examples if you haven't