Shell Scripting Advanced — SIGTERM Trap Leaves Temp Files
Kubernetes sends SIGTERM on pod shutdown, not SIGINT — failing to trap it leaves /tmp littered.
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
- Process substitution (<(cmd)) feeds command output as a file without temp files
- Signal traps (trap 'handler' SIGTERM) catch OS signals to run cleanup
- Subshells ((cmd)) run commands in isolated environments — variables don't leak
- File descriptor management (exec 3>file) prevents descriptor leaks and race conditions
- Performance insight: process substitution avoids disk I/O, but forks per substitution
- Production insight: missing trap leaves zombie processes and broken locks on container restart
- Biggest mistake: assuming wait in a trap works — it deadlocks in signal context
Imagine your shell script is a factory floor manager. Basic scripts just shout instructions one at a time. Advanced scripting is like giving that manager a walkie-talkie, a panic button, a set of private offices for side conversations, and a logbook that writes itself — all at once. Process substitution lets two workers share data without leaving paper on the floor. Signal traps are the emergency stop button. Subshells are the private offices where experiments happen without disturbing the main floor.
Every DevOps engineer has hit the same wall: a shell script that works beautifully on a laptop but silently corrupts data in production, leaves zombie processes behind after a Kubernetes pod restart, or races itself when two cron jobs fire at the same millisecond. That wall is not a Bash limitation — it's the gap between scripting and engineering. The difference is understanding what the shell is actually doing beneath the syntax.
Shell scripts fail in production for three predictable reasons: they don't handle signals (so cleanup never runs when a container dies), they mismanage file descriptors (so logs get garbled or pipes deadlock), and they make assumptions about subshell variable scope (so a loop that 'obviously' increments a counter does nothing). These aren't beginner mistakes — senior engineers hit them too, because they only surface under specific timing conditions or OS configurations.
By the end of this article you'll be able to write scripts that trap and handle SIGTERM gracefully, use process substitution to diff two live command outputs without temp files, manage file descriptors explicitly to prevent descriptor leaks, implement advisory locking to prevent concurrent runs, and structure a production-grade script with a proper exit framework. These are the patterns that make the difference between a script you trust at 3 AM and one you babysit.
What Shell Scripting Advanced Really Means
Advanced shell scripting is the discipline of writing robust, production-grade Bash programs that handle signals, manage resources, and compose complex workflows without leaking state. The core mechanic is explicit control over process lifecycle — trapping SIGTERM, SIGINT, and EXIT to clean up temporary files, release locks, or roll back partial operations. Without these traps, a script killed mid-flight leaves corrupted data and orphaned resources.
In practice, advanced scripting relies on three properties: idempotent cleanup routines, atomic file operations (e.g., mv over write), and strict error handling via set -euo pipefail. A trap on EXIT guarantees cleanup even on unexpected termination, but only if the trap handler itself is idempotent and fast — a slow trap can delay process shutdown and cause cascading failures in orchestrated environments like Kubernetes.
Use advanced patterns when your script manages state beyond its own process — creating temp files, acquiring locks, or modifying shared filesystems. In CI/CD pipelines, cron jobs, or container entrypoints, a missing trap is the difference between a clean retry and a silent corruption that surfaces hours later. This is not about elegance; it's about survival under real-world conditions.
Process Substitution: How It Works and Where It Breaks
Process substitution (<(command)) lets you pass the output of a command as if it were a file. It's syntactic sugar for a temporary named pipe managed by the shell. Use it when you need to feed the result of a command into something that expects a file argument — like diff, comm, or paste.
The shell creates a file descriptor backed by a pipe, then substitutes the path /dev/fd/N in the command line. The command reads from that descriptor. No temp file hits disk, no separate process to manage.
But process substitution only works in bash, zsh, and ksh — not in dash or sh. Portable scripts must fall back to temp files or explicit pipes. Also, each substitution forks a child process. Heavy use can exhaust process limits.
- The shell creates a pipe, forks a child, and writes child's stdout to one end.
- It substitutes the path
/dev/fd/Nat the command line — ordinary file operations apply. - The command runs concurrently with the parent — no buffering until read completes.
- The descriptor is automatically cleaned when both sides finish.
- Unlike
|, process substitution works in argument positions — not just stdin.
Signal Traps: The Cleanup That Never Runs
trap registers a command or function to run when the shell receives a signal or exits. The most common production mistake is trapping only SIGINT and forgetting SIGTERM. Kubernetes, Docker, and systemd all send SIGTERM by default when they want a process to stop gracefully.
An even subtler trap: trapping a signal inside a function scope. The trap is global — once set, it applies to the entire shell session. If multiple scripts source a shared library, traps can collide. Always reset traps at the start of your function and restore them afterward.
Another trap: using wait inside a trap handler. Signals are blocked while the trap runs, so wait may return immediately without waiting for children. Use a polling loop instead.
Subshells and Variable Scope: The Silent Counter Bug
A subshell is a child shell process spawned with (command) or implicitly by pipelines, command substitution, and background jobs. Variables set inside a subshell are invisible to the parent. This is the root cause of the classic "counter doesn't increment" bug.
Pipelines like seq 10 | while read n; do ((count++)); done run the while loop in a subshell. The variable count is updated in the subshell, then lost when the pipeline completes. The fix: use shopt -s lastpipe (bash 4.2+) to run the last pipeline command in the current shell, or avoid pipelines entirely with process substitution or temporary storage.
Another common pattern: cd inside a subshell doesn't change the parent's directory — great for temporary operations, but surprising if you expect persistent changes.
$(...), and background & all create subshells. Only the lastpipe option (and process substitution) avoid subshells for the last pipeline segment.coproc or named pipes for complex state sharing.File Descriptor Management: Closing Leaks Before They Close You
Every opened file consumes a file descriptor (fd). The shell inherits three: 0 (stdin), 1 (stdout), 2 (stderr). Opening additional fds with exec N>file or exec N<file adds to the per-process limit. A leaked fd means the file cannot be deleted or unmounted, and eventually the process hits the kernel limit (ulimit -n).
In production scripts, fd leaks often occur when: (1) a function opens a descriptor but doesn't close it on error; (2) a trap handler tries to write to a closed fd; (3) multiple scripts share a lock file and the fd is never released.
The safe pattern: wrap fds in a subshell so they close automatically when the subshell exits. Or use explicit exec N>&- to close. For lock files, use flock which manages the fd lifecycle.
lsof -p $$ | wc -l to monitor fd count in production.Production Patterns: Exit Handlers, Locks, and Logging
Production shell scripts need four things to survive at 3 AM: (1) a guaranteed cleanup function attached to EXIT, (2) advisory locking to prevent concurrent runs, (3) structured logging with timestamps and severity levels, and (4) strict error handling with set -euo pipefail.
The exit handler should remove temp files, release locks, and flush buffers. Locking with flock (not mkdir-based locks) is atomic and works across NFS. Logging should write to a file with timestamps and source context, and rotate via logrotate — never let the script manage rotation.
Error handling: set -e makes the script exit on any unchecked failure. set -u treats unset variables as errors. set -o pipefail ensures pipeline failures propagate. Combine them at the top of every script.
trap 'log_error "Command failed at line $LINENO"' ERR to catch failures.Why Your Bash Script Only Worked Once: Execution Environment
You SSH into a box, run your script, and everything works. Five minutes later, it fails for the next person. The problem is never the logic—it's the environment you assumed existed. Every script you write must enforce its own reality. Never trust PATH. Never assume a tool is installed. Never assume the script runs from a specific directory. At minimum, take the absolute path of the script's location, set PATH to known values, and validate required binaries at the top. This is not paranoia. This is what separates a script that runs in production from one that burns a pager at 3 AM. The ten seconds you spend hardening the execution environment saves hours of debugging. Start every advanced script with environment assertions. It is the first line of defense against silent failures that cascade into data loss.
jq isn't found. Always re-define PATH in the script, never rely on inherited context.Arrays and Iteration: The Silent Infinite Loop
You wrote a loop over files. It runs forever, fills the disk, and kills the partition. Sound familiar? The classic mistake: using for file in $(ls .log) instead of for file in .log. The first splits on whitespace, performs pathname expansion, and breaks on filenames with spaces or newlines. The second is safe, fast, and correct. The deeper issue is that many developers treat arrays as an afterthought. In production scripting, arrays are essential for safe iteration over unpredictable data: lists of PIDs, file paths, API response fields. Always quote your array expansions: "${array[@]}". This preserves each element as a single token. Combined with set -u, a misspelled variable name becomes a hard error, not silent corruption. Use associative arrays for key-value lookups instead of parsing config files a hundred times. Your loops will be deterministic, your scripts will not eat memory, and your on-call will thank you.
for i in $(cat file) splits on whitespace, not lines. If file contains foo bar, you get two iterations: 'foo' and 'bar'. Use while IFS= read -r line; do ... done < file to get each line exactly as it exists.for x in $(...). Use arrays or while-read loops. Always quote array expansions with "${array[@]}" to preserve element boundaries.The SIGTERM Trap That Left Temp Files Behind
- Always trap EXIT for cleanup — it fires on any exit, signal or normal.
- Never assume the signal your script receives; production orchestrators send SIGTERM.
- Test signal handling with 'kill -TERM <pid>' in a staging environment.
trap cleanup EXIT SIGTERM SIGINTkill -TERM $$ # test locallyKey takeaways
Common mistakes to avoid
6 patternsMemorising syntax before understanding the concept
Skipping practice and only reading theory
Using trap without EXIT
Assuming pipeline variables persist
Not closing file descriptors
Assuming wait works inside a trap handler
Interview Questions on This Topic
Explain how process substitution works under the hood. What happens when you write `diff <(cmd1) <(cmd2)`?
cmd1 and cmd2. It substitutes the path /dev/fd/N (where N is the read end of the pipe) at the command line. diff opens those paths like regular files and reads the output. The children run concurrently. The descriptors are cleaned when both diff and children finish. One key detail: on Linux, /dev/fd is a symlink to the process's current fd directory — so read and write fds are visible.Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
That's Linux. Mark it forged?
6 min read · try the examples if you haven't