Senior 4 min · March 06, 2026

Shell Scripting Advanced — SIGTERM Trap Leaves Temp Files

Kubernetes sends SIGTERM on pod shutdown, not SIGINT — failing to trap it leaves /tmp littered.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Process substitution (<(cmd)) feeds command output as a file without temp files
  • Signal traps (trap 'handler' SIGTERM) catch OS signals to run cleanup
  • Subshells ((cmd)) run commands in isolated environments — variables don't leak
  • File descriptor management (exec 3>file) prevents descriptor leaks and race conditions
  • Performance insight: process substitution avoids disk I/O, but forks per substitution
  • Production insight: missing trap leaves zombie processes and broken locks on container restart
  • Biggest mistake: assuming wait in a trap works — it deadlocks in signal context
Plain-English First

Imagine your shell script is a factory floor manager. Basic scripts just shout instructions one at a time. Advanced scripting is like giving that manager a walkie-talkie, a panic button, a set of private offices for side conversations, and a logbook that writes itself — all at once. Process substitution lets two workers share data without leaving paper on the floor. Signal traps are the emergency stop button. Subshells are the private offices where experiments happen without disturbing the main floor.

Every DevOps engineer has hit the same wall: a shell script that works beautifully on a laptop but silently corrupts data in production, leaves zombie processes behind after a Kubernetes pod restart, or races itself when two cron jobs fire at the same millisecond. That wall is not a Bash limitation — it's the gap between scripting and engineering. The difference is understanding what the shell is actually doing beneath the syntax.

Shell scripts fail in production for three predictable reasons: they don't handle signals (so cleanup never runs when a container dies), they mismanage file descriptors (so logs get garbled or pipes deadlock), and they make assumptions about subshell variable scope (so a loop that 'obviously' increments a counter does nothing). These aren't beginner mistakes — senior engineers hit them too, because they only surface under specific timing conditions or OS configurations.

By the end of this article you'll be able to write scripts that trap and handle SIGTERM gracefully, use process substitution to diff two live command outputs without temp files, manage file descriptors explicitly to prevent descriptor leaks, implement advisory locking to prevent concurrent runs, and structure a production-grade script with a proper exit framework. These are the patterns that make the difference between a script you trust at 3 AM and one you babysit.

What Is Advanced Shell Scripting?

Advanced shell scripting is the practice of using Bash internals — process substitution, signal traps, subshells, and explicit file descriptor management — to write scripts that survive production conditions. It's not about memorising arcane syntax; it's about understanding how the shell manages processes, file descriptors, and signals under the hood.

You advance from scripting to engineering when you stop treating the shell as a black box. You understand that < <(cmd) is syntactic sugar for a pipe and a file descriptor, that trap cleanup EXIT guarantees cleanup even when your orchestrator sends SIGTERM, and that a pipeline's while loop runs in a subshell — so variables set inside it vanish.

This section sets the stage for the deep dives ahead. You'll see a production-ready script that uses all four techniques right now.

thecodeforge_advanced_intro.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env bash
# TheCodeForge – advanced shell scripting skeleton
set -euo pipefail

cleanup() {
    local rc=$?
    rm -rf /tmp/thecodeforge_*
    exit "$rc"
}
trap cleanup EXIT SIGTERM SIGINT

log_info() {
    echo "[$(date +'%Y-%m-%dT%H:%M:%S')] [INFO] $*" >> /var/log/thecodeforge.log
}

# Process substitution example: compare file listings
diff <(ls /app/current) <(ls /app/backup) > /dev/null || log_info "Differences found"

echo "Script finished normally"
Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
Production Insight
The default trap on EXIT is the only signal that fires even on SIGKILL? No — SIGKILL is uncatchable.
But for all normal exits (including SIGTERM, SIGINT, and normal completion), EXIT covers them all.
Rule: always use trap cleanup EXIT plus any specific signals you need to intercept.
Pro tip: test signal handling with kill -TERM $$ in a debug mode.
Key Takeaway
Advanced shell scripting = understanding shell internals.
Four pillars: process substitution, traps, subshells, fd management.
Every production script needs set -euo pipefail, trap EXIT, and explicit fd lifecycle.

Process Substitution: How It Works and Where It Breaks

Process substitution (<(command)) lets you pass the output of a command as if it were a file. It's syntactic sugar for a temporary named pipe managed by the shell. Use it when you need to feed the result of a command into something that expects a file argument — like diff, comm, or paste.

The shell creates a file descriptor backed by a pipe, then substitutes the path /dev/fd/N in the command line. The command reads from that descriptor. No temp file hits disk, no separate process to manage.

But process substitution only works in bash, zsh, and ksh — not in dash or sh. Portable scripts must fall back to temp files or explicit pipes. Also, each substitution forks a child process. Heavy use can exhaust process limits.

thecodeforge_process_sub.shBASH
1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env bash
# TheCodeForge – process substitution with diff
# Compare package lists on two servers via SSH
server_a="web01.prod"
server_b="web02.prod"

diff <(ssh "$server_a" dpkg -l) <(ssh "$server_b" dpkg -l) | head -20

echo "Only on $server_a:"
comm -23 <(ssh "$server_a" dpkg -l | sort) <(ssh "$server_b" dpkg -l | sort)
Process Substitution Mental Model
  • The shell creates a pipe, forks a child, and writes child's stdout to one end.
  • It substitutes the path /dev/fd/N at the command line — ordinary file operations apply.
  • The command runs concurrently with the parent — no buffering until read completes.
  • The descriptor is automatically cleaned when both sides finish.
  • Unlike |, process substitution works in argument positions — not just stdin.
Production Insight
Production containers with minimal images often lack /dev/fd support.
The script works locally but fails on Alpine or distroless bases.
Rule: test process substitution in your production base image before relying on it.
If /dev/fd is missing, use named pipes or temp files as a portable fallback.
Key Takeaway
Process substitution is syntactic pipe sugar for file-argument contexts.
It forks per instance — don't use in loops without limit.
Test against your production base image or fall back to named pipes.
When to Use Process Substitution vs Temp Files vs Pipes
IfCommand needs file argument, not stdin
UseUse process substitution or temp file
IfCommand needs stdin but data comes from previous pipe
UseUse pipe | — no substitution needed
IfScript must be portable to sh
UseUse temp file (mktemp) — process substitution unavailable
IfMultiple data sources needed simultaneously
UseUse process substitution — each gets its own fd

Signal Traps: The Cleanup That Never Runs

trap registers a command or function to run when the shell receives a signal or exits. The most common production mistake is trapping only SIGINT and forgetting SIGTERM. Kubernetes, Docker, and systemd all send SIGTERM by default when they want a process to stop gracefully.

An even subtler trap: trapping a signal inside a function scope. The trap is global — once set, it applies to the entire shell session. If multiple scripts source a shared library, traps can collide. Always reset traps at the start of your function and restore them afterward.

Another trap: using wait inside a trap handler. Signals are blocked while the trap runs, so wait may return immediately without waiting for children. Use a polling loop instead.

thecodeforge_signal_trap.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/env bash
# TheCodeForge – production-grade trap handler
cleanup() {
    local exit_code=$?
    echo "[$(date)] Cleanup: removing lock file and temp workspace" >&2
    rm -f /var/run/myapp.lock
    rm -rf /tmp/myapp_*
    exit "$exit_code"
}

# Trap EXIT guarantees cleanup on any exit, including SIGTERM
# Also trap the signals explicitly to prevent default termination
trap cleanup EXIT SIGTERM SIGINT

echo "Starting batch job..."
# Simulate work
for i in {1..10}; do
    echo "Processing chunk $i"
    sleep 1
done
echo "Job complete"
Warning: Trap Stacking
Setting a new trap replaces the old one for that signal. To preserve previous traps, save and restore them. Use 'old_trap=$(trap -p EXIT)' and then later 'eval "$old_trap"'.
Production Insight
A developer once trapped only SIGINT — manual testing worked fine.
Deployed rolling update sent SIGTERM, no cleanup ran, lock file persisted.
Next deployment failed because the lock file from the dead instance was still there.
Rule: always trap EXIT; consider EXIT as the only trap you need.
Pro tip: add a DEBUG trap for log rotation handling.
Key Takeaway
Trap EXIT, not just SIGTERM. EXIT fires on any normal or signal-triggered exit.
For bash, you can trap both EXIT and SIGTERM, but EXIT alone suffices.
Never call wait inside a trap — it deadlocks or returns immediately in signal context.

Subshells and Variable Scope: The Silent Counter Bug

A subshell is a child shell process spawned with (command) or implicitly by pipelines, command substitution, and background jobs. Variables set inside a subshell are invisible to the parent. This is the root cause of the classic "counter doesn't increment" bug.

Pipelines like seq 10 | while read n; do ((count++)); done run the while loop in a subshell. The variable count is updated in the subshell, then lost when the pipeline completes. The fix: use shopt -s lastpipe (bash 4.2+) to run the last pipeline command in the current shell, or avoid pipelines entirely with process substitution or temporary storage.

Another common pattern: cd inside a subshell doesn't change the parent's directory — great for temporary operations, but surprising if you expect persistent changes.

thecodeforge_subshell_counter.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/usr/bin/env bash
# TheCodeForge – subshell variable scope demo
# This DOES NOT work – counter is lost
count=0
seq 5 | while read n; do ((count+=n)); done
echo "Using pipe: count=$count"  # prints 0

# This works – lastpipe
shopt -s lastpipe 2>/dev/null || true
count=0
seq 5 | while read n; do ((count+=n)); done
echo "With lastpipe: count=$count"  # prints 15

# Alternative using process substitution
count=0
while read n; do ((count+=n)); done < <(seq 5)
echo "Process sub: count=$count"  # also 15
Subshell Scope
Commands in parentheses, pipelines, command substitution $(...), and background & all create subshells. Only the lastpipe option (and process substitution) avoid subshells for the last pipeline segment.
Production Insight
A data pipeline script used a while loop over a large CSV file with a counter.
The counter never reached the expected total — data was silently dropped.
Root cause: pipeline subshell lost the accumulator.
Fix: switched to a for loop over lines from a file descriptor to avoid subshell.
Rule: if you need to persist variables from loops, avoid pipelines or enable lastpipe.
Pro tip: use coproc or named pipes for complex state sharing.
Key Takeaway
Subshells inherit but cannot modify the parent's environment.
Pipelines with while loops are the most common source of silent counter bugs.
Use process substitution < <(cmd) or lastpipe to keep state in the current shell.

File Descriptor Management: Closing Leaks Before They Close You

Every opened file consumes a file descriptor (fd). The shell inherits three: 0 (stdin), 1 (stdout), 2 (stderr). Opening additional fds with exec N>file or exec N<file adds to the per-process limit. A leaked fd means the file cannot be deleted or unmounted, and eventually the process hits the kernel limit (ulimit -n).

In production scripts, fd leaks often occur when: (1) a function opens a descriptor but doesn't close it on error; (2) a trap handler tries to write to a closed fd; (3) multiple scripts share a lock file and the fd is never released.

The safe pattern: wrap fds in a subshell so they close automatically when the subshell exits. Or use explicit exec N>&- to close. For lock files, use flock which manages the fd lifecycle.

thecodeforge_fd_management.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/usr/bin/env bash
# TheCodeForge – safe fd management
# Using subshell to auto-close fd
(
    exec 3> /var/log/myapp.log
    echo "Logging to fd 3" >&3
    # fd 3 closed when subshell exits
)

# Using explicit close with error handling
myfunction() {
    exec 4< /etc/config/app.conf
    # read config
    exec 4<&-  # close on success
}

# Lock file with flock – fd managed by lock
(
    flock -n 9 || exit 1
    # critical section
    sleep 10
) 9>/var/lock/myapp.lock
Production Insight
A logging script opened a new fd per log file every minute but never closed them.
After 1024 minutes, all fds were consumed — the script could not open any new file.
No logs were written for the next 6 hours before the alert triggered.
Fix: moved to a single fd with rotation, and added exec to close unused fds after each log write.
Rule: every exec N>... should have a matching exec N>&- or be wrapped in a subshell.
Pro tip: use lsof -p $$ | wc -l to monitor fd count in production.
Key Takeaway
File descriptors are a finite resource — ulimit -n is default 1024.
Leak one per loop iteration and you'll crash hard after 1024 runs.
Wrap fd usage in subshells or always pair open with close in the same scope.

Production Patterns: Exit Handlers, Locks, and Logging

Production shell scripts need four things to survive at 3 AM: (1) a guaranteed cleanup function attached to EXIT, (2) advisory locking to prevent concurrent runs, (3) structured logging with timestamps and severity levels, and (4) strict error handling with set -euo pipefail.

The exit handler should remove temp files, release locks, and flush buffers. Locking with flock (not mkdir-based locks) is atomic and works across NFS. Logging should write to a file with timestamps and source context, and rotate via logrotate — never let the script manage rotation.

Error handling: set -e makes the script exit on any unchecked failure. set -u treats unset variables as errors. set -o pipefail ensures pipeline failures propagate. Combine them at the top of every script.

thecodeforge_production_pattern.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env bash
# TheCodeForge – production script template
set -euo pipefail

# Logging
log_info() { echo "[$(date +'%Y-%m-%dT%H:%M:%S')] [INFO] $*" >> /var/log/myapp.log; }
log_error() { echo "[$(date +'%Y-%m-%dT%H:%M:%S')] [ERROR] $*" >&2; }

# Exit handler
cleanup() {
    local rc=$?
    rm -rf /tmp/myapp_*
    flock -u 9 2>/dev/null || true
    log_info "Cleanup complete, exit code $rc"
    exit "$rc"
}
trap cleanup EXIT

# Advisory lock
(
    flock -n 9 || {
        log_error "Another instance is running"
        exit 1
    }
    log_info "Lock acquired, starting job"
    # actual work here
) 9>/var/lock/myapp.lock

exit 0
Pro Tip: logrotate config
Add a logrotate entry for your script's log file: /var/log/myapp.log { daily rotate 7 compress missingok notifempty }
Production Insight
A script without set -e continued running after a critical failure.
It deleted production data because a backup command failed silently.
Recovery required a full database restore, costing 4 hours of downtime.
Now every script must have set -euo pipefail and reviewers check for it.
Rule: always start production scripts with set -euo pipefail.
Pro tip: use trap 'log_error "Command failed at line $LINENO"' ERR to catch failures.
Key Takeaway
set -euo pipefail is non-negotiable for production scripts.
Use flock for locking — it's atomic and self-cleaning on process death.
Write logs with timestamps and redirect errors to stderr for easy grep filtering.
● Production incidentPOST-MORTEMseverity: high

The SIGTERM Trap That Left Temp Files Behind

Symptom
/tmp filled with .tmp files from a shell script that processes CSV uploads. Pod restarts didn't clean them.
Assumption
The trap on SIGINT was enough — the script only expected manual Ctrl+C.
Root cause
Kubernetes sends SIGTERM on pod shutdown, not SIGINT. The trap handler was only defined for SIGINT, so on SIGTERM the script died immediately without cleanup.
Fix
Changed trap to handle both signals, and added a trap on EXIT to guarantee cleanup regardless of how the script exits.
Key lesson
  • Always trap EXIT for cleanup — it fires on any exit, signal or normal.
  • Never assume the signal your script receives; production orchestrators send SIGTERM.
  • Test signal handling with 'kill -TERM <pid>' in a staging environment.
Production debug guideCommon symptoms and immediate actions for process substitution, traps, and subshell bugs4 entries
Symptom · 01
Process substitution returns 'No such file or directory'
Fix
Check that /dev/fd exists (Linux only). If using a container with minimal base, /dev/fd may be missing. Fall back to named pipes or temp files.
Symptom · 02
Cleanup code doesn't run after restart
Fix
Inspect which signal the orchestrator sends. Kubernetes uses SIGTERM, Docker uses SIGTERM. Ensure trap covers SIGTERM and EXIT.
Symptom · 03
Counter variable doesn't increment inside loop
Fix
Check if pipeline creates subshell. Use 'shopt -s lastpipe' or restructure to avoid subshells. Use a file or named pipe to share state.
Symptom · 04
Script hangs on read or write to file descriptor
Fix
Check for fd leaks: run 'lsof -p <pid>'. Ensure exec closes descriptors after use. Use 'exec <&-', 'exec >&-' to close.
★ Shell Script Signal Trap & Subshell Quick FixesThree most common production shell script failures and how to fix them in under 60 seconds.
Cleanup not running on container stop
Immediate action
Add a trap on EXIT that calls the cleanup function.
Commands
trap cleanup EXIT SIGTERM SIGINT
kill -TERM $$ # test locally
Fix now
trap 'rm -rf /tmp/myscript; exit' EXIT; add SIGTERM and SIGINT to the same trap
Variable not updated after pipeline+
Immediate action
Check if the pipeline runs in a subshell.
Commands
echo "$BASHPID" # compare with subshell echo
shopt | grep lastpipe
Fix now
shopt -s lastpipe; echo "count=0"; seq 10 | while read n; do ((count++)); done; echo $count
File descriptor left open after function returns+
Immediate action
Identify fd number and close it.
Commands
exec 3>&- # close fd 3
lsof -p $$ | grep '^3'
Fix now
Wrap fd in a subshell to auto-close: (exec 3>file; cmd) # fd 3 closed on exit
Comparison of Shell Techniques
ConceptUse CaseExample
Shell Scripting AdvancedCore usageSee code above
Process SubstitutionMultiple file arguments from commandsdiff <(cmd1) <(cmd2)
Pipes (|)Chaining stdout to stdincmd1 | cmd2
Temp Files (mktemp)Portable scripts or large outputtmp=$(mktemp); cmd1 > $tmp; cmd2 $tmp
Flock LockingPrevent concurrent executionflock -n 9 || exit 1

Key takeaways

1
Process substitution is piped data as a file
use it for diff, comm, and cases requiring multiple inputs in argument positions.
2
Trap EXIT, not just SIGTERM
it fires on any shell exit and covers signals from orchestrators.
3
Subshells are invisible boundaries for variables
avoid pipelines with stateful loops or use lastpipe/process substitution.
4
File descriptors are a finite resource
always close what you open, or wrap in subshells.
5
Production scripts must set -euo pipefail, lock with flock, and log with timestamps to be trustworthy at 3 AM.
6
Test signal handling against the actual orchestrator (Kubernetes, Docker) in staging, not just with Ctrl+C.

Common mistakes to avoid

6 patterns
×

Memorising syntax before understanding the concept

Symptom
Unable to adapt patterns to new situations; scripts break when environment changes (e.g., different shell, minimal container).
Fix
Focus on understanding why each technique works (e.g., why process substitution creates a pipe) rather than memorising the syntax. Write small experiments.
×

Skipping practice and only reading theory

Symptom
Cannot debug a real shell script failure under pressure; know the concepts but fail to apply them in production.
Fix
Set up a sandbox environment (Vagrant, Docker) and deliberately break scripts. Practice debugging with bash -x and trap DEBUG.
×

Using trap without EXIT

Symptom
Cleanup code never runs when the script receives SIGTERM from orchestrator (Kubernetes, Docker). Temp files and lock files accumulate.
Fix
Replace all signal traps with a single trap on EXIT, and optionally add explicit SIGTERM/SIGINT traps that call the same cleanup.
×

Assuming pipeline variables persist

Symptom
A counter or accumulator inside a while-read loop always stays at initial value, causing silent data corruption or incorrect aggregations.
Fix
Enable lastpipe (shopt -s lastpipe) or use process substitution to avoid the subshell. For portability, use a temporary file to accumulate state.
×

Not closing file descriptors

Symptom
After many iterations, the script fails with 'Too many open files'. Cannot delete temporary files because the descriptor holds them open.
Fix
Always close fds with exec N>&- after use, or wrap fd usage in a subshell which auto-closes all fds on exit.
×

Assuming wait works inside a trap handler

Symptom
Trap handler exits before background child processes finish, leaving incomplete work and possibly orphaned processes.
Fix
Instead of wait in trap, use a separate polling loop or set a flag and let the main loop handle completion after trap exit.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain how process substitution works under the hood. What happens when...
Q02SENIOR
You have a bash script that uses `while read line; do ((count++)); done ...
Q03JUNIOR
What is the difference between `trap cleanup SIGTERM` and `trap cleanup ...
Q04SENIOR
A script uses `(exec 3>logfile; echo "hello" >&3)` and the fd seems to c...
Q01 of 04SENIOR

Explain how process substitution works under the hood. What happens when you write `diff <(cmd1) <(cmd2)`?

ANSWER
The shell creates a pipe for each process substitution, then forks a child to run cmd1 and cmd2. It substitutes the path /dev/fd/N (where N is the read end of the pipe) at the command line. diff opens those paths like regular files and reads the output. The children run concurrently. The descriptors are cleaned when both diff and children finish. One key detail: on Linux, /dev/fd is a symlink to the process's current fd directory — so read and write fds are visible.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between `$(cmd)` and `< <(cmd)`?
02
Does `trap cleanup EXIT` also run when the script is killed with SIGKILL?
03
Can I use process substitution inside a function that I source from another script?
04
Why does `flock` work better than `mkdir` for locking?
05
How can I debug a script that behaves differently when run by cron?
🔥

That's Linux. Mark it forged?

4 min read · try the examples if you haven't

Previous
Shell Scripting Basics
6 / 12 · Linux
Next
cron Jobs in Linux