Senior 3 min · March 24, 2026

SJF Scheduling — New Short Jobs Starve Long Builds

Short build jobs finished fast, but a long integration test waited 4+ hours — uncover the starvation trap in SJF and how aging prevents this problem.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • SJF selects the shortest burst process when CPU becomes free
  • Non-preemptive: runs to completion once started
  • Preemptive (SRTF): preempts if new process has shorter remaining time
  • SJF minimizes average waiting time among non-preemptive algorithms
  • Not usable in practice because burst times are unknown
  • Starvation risk: long processes may never run if short ones keep arriving
Plain-English First

SJF is like a smart cashier who, when finishing a customer, always picks the next customer with the fewest items. This minimises the total time everyone spends waiting. Preemptive SJF (SRTF) is even smarter — if a new short customer arrives while serving a longer one, the cashier switches immediately.

SJF is provably optimal for average waiting time among all non-preemptive scheduling algorithms — and SRTF is optimal among all algorithms including preemptive ones. This optimality is not a heuristic or approximation; it follows directly from an exchange argument. If any solution does not follow SJF order, you can swap adjacent jobs to improve average waiting time.

The practical problem: burst times are not known in advance. Linux's CFS (Completely Fair Scheduler) approximates SJF by tracking vruntime — the virtual runtime each process has accumulated — and always scheduling the process with the smallest vruntime. It is SJF-like without requiring burst time prediction, at the cost of true optimality.

Non-Preemptive SJF

When the CPU becomes free, pick the arrived process with the shortest burst time. Once a process starts, it runs to completion. This is provably optimal among non-preemptive algorithms.

Common misconception: 'Shortest' means smallest expected burst, not actual burst. In practice, we use predicted burst times. The algorithm is also called Shortest Process Next (SPN).

sjf.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import heapq

def sjf_non_preemptive(processes: list[dict]) -> list[dict]:
    procs = sorted(processes, key=lambda p: p['arrival'])
    results = []
    current_time = 0
    heap = []  # (burst, arrival, process)
    i = 0
    while i < len(procs) or heap:
        # Add all processes that have arrived
        while i < len(procs) and procs[i]['arrival'] <= current_time:
            heapq.heappush(heap, (procs[i]['burst'], procs[i]['arrival'], procs[i]))
            i += 1
        if not heap:
            current_time = procs[i]['arrival']
            continue
        burst, arrival, p = heapq.heappop(heap)
        start = current_time
        ct  = start + p['burst']
        tat = ct - p['arrival']
        wt  = tat - p['burst']
        current_time = ct
        results.append({**p, 'start': start, 'ct': ct, 'tat': tat, 'wt': wt})
    return results

processes = [
    {'pid':'P1','arrival':0,'burst':8},
    {'pid':'P2','arrival':1,'burst':4},
    {'pid':'P3','arrival':2,'burst':2},
    {'pid':'P4','arrival':3,'burst':1},
]
results = sjf_non_preemptive(processes)
for r in results:
    print(f"{r['pid']}: WT={r['wt']}, TAT={r['tat']}")
print(f"Avg WT: {sum(r['wt'] for r in results)/len(results):.2f}")
Output
P1: WT=0, TAT=8
P4: WT=7, TAT=8
P3: WT=8, TAT=10
P2: WT=11, TAT=15
Avg WT: 6.50
Production Insight
Non-preemptive SJF works well in batch systems where job bursts are known and stable.
Starvation occurs when a stream of short jobs arrives before a long job can execute.
Rule: always combine with aging to ensure no job waits beyond a threshold.
Key Takeaway
Non-preemptive SJF is optimal for average WT only when all burst times are known ahead of time.
In online systems with unknown arrivals, it causes starvation of long processes.
Use aging or switch to CFS for production.

Preemptive SJF — SRTF

Shortest Remaining Time First preempts the running process when a new process arrives with shorter remaining time. It is SJF applied continuously — at every arrival, the scheduler re-evaluates and picks the process with smallest remaining burst.

SRTF achieves the absolute minimum average waiting time among all scheduling algorithms, but at a cost: frequent preemptions can increase context switch overhead, and tracking remaining times adds complexity.

srtf.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def srtf(processes: list[dict]) -> list[dict]:
    """Shortest Remaining Time First — preemptive SJF."""
    procs = [dict(p, remaining=p['burst'], start=-1) for p in processes]
    procs.sort(key=lambda p: p['arrival'])
    n = len(procs)
    time = 0
    completed = 0
    current = None
    results = {}
    while completed < n:
        # Find process with shortest remaining time among arrived
        arrived = [p for p in procs if p['arrival'] <= time and p['remaining'] > 0]
        if not arrived:
            time += 1
            continue
        current = min(arrived, key=lambda p: p['remaining'])
        if current['start'] == -1:
            current['start'] = time
        current['remaining'] -= 1
        time += 1
        if current['remaining'] == 0:
            ct  = time
            tat = ct - current['arrival']
            wt  = tat - current['burst']
            results[current['pid']] = {**current,'ct':ct,'tat':tat,'wt':wt}
            completed += 1
    return list(results.values())

results = srtf(processes)
for r in results:
    print(f"{r['pid']}: WT={r['wt']}, TAT={r['tat']}")
print(f"Avg WT: {sum(r['wt'] for r in results)/len(results):.2f}")
Output
P1: WT=9, TAT=17
P2: WT=1, TAT=5
P3: WT=0, TAT=2
P4: WT=0, TAT=1
Avg WT: 2.50
Production Insight
SRTF can cause frequent preemptions if short jobs arrive continuously, leading to high context switch overhead.
Remaining time estimation errors compound: if a process is preempted based on a wrong remaining time, the scheduler may make suboptimal choices.
Rule: use only when burst prediction is reliable and context switch cost is negligible.
Key Takeaway
SRTF minimises average waiting time theoretically but requires accurate remaining time tracking.
It incurs context switch overhead proportional to arrival frequency.
In practice, CFS's vruntime approach avoids these costs while approximating SJF behaviour.

Why SJF is Optimal

SJF minimises average waiting time among all non-preemptive algorithms. Proof by exchange argument: if a longer job runs before a shorter one, swapping them reduces total waiting time. SRTF minimises average waiting time among ALL scheduling algorithms (preemptive and non-preemptive).

The exchange argument is simple: consider two adjacent jobs in a schedule. If a longer job precedes a shorter one, you can swap them and reduce the waiting time of the shorter job by the full length of the longer job, while the longer job waits only the short job's burst. The reduction is always positive.

Starvation Problem
SJF can starve long processes — if short processes keep arriving, long ones never run. Solution: aging — gradually increase priority of waiting processes over time.
Production Insight
Optimality proofs assume all burst times are known and fixed – unrealistic in production.
Starvation is mathematically guaranteed if job arrival rate exceeds service rate for short jobs.
Rule: optimal average WT is not the only metric – consider fairness, response time for interactive tasks, and starvation prevention.
Key Takeaway
SJF is optimal only under ideal conditions: known bursts and no starvation.
Real systems must trade off optimal average WT for fairness and predictable response times.
Aging is the production-proven fix to make SJF practical.

Starvation and Aging

Starvation occurs when a long process never gets CPU because shorter processes keep arriving. Aging is the standard solution: increase the priority of a process the longer it waits. Priority can be based on waiting time (e.g., priority = base_priority + waiting_time / quantum). Once priority surpasses current running process, the scheduler may preempt.

Aging transforms SJF into a priority-based scheduler with time-dependent boosting. It guarantees every process eventually makes progress, at the cost of slightly increased average waiting time for short processes.

Production Insight
Aging parameters matter: too aggressive (high boost rate) makes short jobs wait unnecessarily; too slow doesn't prevent starvation.
In production, start with linear aging (priority += 1 per unit wait) and monitor maximum observed waiting time.
Rule: set a hard bound – no process waits more than N seconds, regardless of aging.
Key Takeaway
Aging is the production-essential addition to SJF.
It guarantees progress but sacrifices some theoretical optimality.
Tune aging rate based on observed max wait times, not generic defaults.

Burst Time Prediction

Real schedulers don't know burst times in advance. Prediction via exponential averaging: τ_{n+1} = α × t_n + (1-α) × τ_n where t_n is actual burst time, τ_n is predicted, α controls responsiveness (typically 0.5).

α close to 1 makes the predictor react quickly to changes but noisy. α close to 0 produces stable predictions but slow to adapt. In Linux's CFS, vruntime replaces burst prediction entirely – the scheduler uses actual CPU usage history instead of predicting future bursts.

Production Insight
Exponential averaging can oscillate if workload patterns are not stationary.
A single outlier burst can corrupt predictions for several subsequent jobs.
Rule: always combine prediction with a sanity-check — if predicted burst exceeds 10x the historical max, fall back to default quantum.
Key Takeaway
Burst prediction is a guess, not a fact.
Exponential averaging is simple but fragile; CFS avoids it entirely by tracking virtual runtime.
For batch systems with stable workloads, offline profiling works better than online prediction.
● Production incidentPOST-MORTEMseverity: high

Starvation Caused by Burst Time Underestimation in Batch Scheduler

Symptom
Short build jobs completed quickly, but a critical long-running integration test job never started for over 4 hours.
Assumption
The scheduler would eventually run the long job when no short jobs were pending.
Root cause
New short jobs kept arriving before long job could run, creating perpetual starvation. SJF non-preemptive never gave the long job a chance because there was always a shorter job.
Fix
Implemented aging: increased priority of waiting processes by 1 for every 10 seconds of waiting. Also set a maximum starvation time after which the job gets highest priority.
Key lesson
  • Starvation is a real production risk with pure SJF – always add aging or use CFS-style fairness.
  • Monitor waiting time per process; alert if any process waits more than 10x its expected burst.
  • In batch systems, consider using priority+aging instead of pure SJF for long-running jobs.
Production debug guideSymptoms and actions for production schedulers using SJF-like logic4 entries
Symptom · 01
A specific job or process never finishes; logs show it's always waiting
Fix
Check waiting time vs burst time ratio. If waiting exceeds 5x burst, starvation is likely. Use aging or increase priority.
Symptom · 02
Average waiting time is higher than expected for a mix of long and short jobs
Fix
Verify scheduler is indeed selecting shortest burst. If bursts are predicted, check prediction accuracy. Consider switching to preemptive SRTF.
Symptom · 03
Frequent preemptions causing high context switch overhead
Fix
Measure context switch rate. If rate is high, increase time quantum or switch to non-preemptive SJF if workloads are mostly short.
Symptom · 04
System throughput drops; long jobs consume excessive CPU after being starved
Fix
Implement starvation detection by tracking last run time per process. Promote starved processes to high priority.
★ Quick Debug: Spotting Starvation in SJF SchedulersUse these commands and checks to identify and fix starvation in real time.
A process has been waiting > 10 seconds after arrival
Immediate action
Check if shorter jobs keep arriving. Run 'ps -eo pid,comm,etime,pcpu' to see wait times.
Commands
ps -eo pid,comm,etime,pcpu --sort=etime | head -20
cat /proc/[pid]/sched | grep 'wait_sum'
Fix now
Manually increase process priority (renice -n -10 [pid]) or kill and restart with higher priority.
Scheduler log shows no progress for a high-burst job+
Immediate action
Check arrival rate of new jobs. If >1 per second, starvation likely.
Commands
sar -u 1 10 | grep average
grep 'context_switch' /proc/stat
Fix now
Temporarily disable new job submissions or increase aging factor.
SJF vs SRTF vs Priority+ Aging
PropertySJF Non-PreemptiveSRTF (Preemptive SJF)SJF + Aging
Average Waiting TimeOptimal (non-preemptive)Optimal (all)Near-optimal (slightly worse for short jobs)
FairnessPoor (starvation risk)Moderate (starvation still possible)Good (every process makes progress)
Context SwitchesMinimal (1 per job)High (at every arrival)Moderate (depends on aging preemption)
Implementation ComplexityLowMedium (track remaining times)Medium (track waiting times)
Production FeasibilityLow (needs burst prediction)Low (needs burst prediction + preemption overhead)High (used in CFS-like systems)

Key takeaways

1
SJF non-preemptive
when CPU free, pick shortest burst among arrived processes.
2
SRTF (preemptive)
preempt when new process has shorter remaining time than current.
3
SJF/SRTF minimise average waiting time
provably optimal.
4
Starvation
long processes may never run if short ones keep arriving — fix with aging.
5
Burst time is unknown in practice
predicted via exponential averaging of past behaviour.
6
Aging guarantees progress but slightly increases average waiting time for short processes.
7
CFS approximates SJF without burst prediction using virtual runtime tracking.

Common mistakes to avoid

3 patterns
×

Assuming burst times are known in advance

Symptom
Scheduler makes suboptimal decisions because predicted bursts differ from actual; average waiting time may exceed theoretic optimal.
Fix
Use exponential averaging with low α (0.3-0.5) and monitor prediction error. For production systems, switch to CFS which uses actual runtime history.
×

Using non-preemptive SJF in interactive systems

Symptom
Users experience high response times for short interactive tasks because a long batch job block CPU.
Fix
Switch to preemptive scheduling (RR or SRTF) or use SJF with preemption after a time quantum.
×

Ignoring starvation until it causes SLA breaches

Symptom
Long-running jobs take orders of magnitude longer than expected, causing timeout failures.
Fix
Implement aging from day one. Set waiting time threshold alarms to detect starvation early.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Prove that SJF minimises average waiting time among non-preemptive algor...
Q02SENIOR
What is the difference between SJF and SRTF?
Q03SENIOR
How does aging solve the starvation problem in SJF?
Q04JUNIOR
Calculate average waiting time for SRTF: P1(AT=0,BT=8), P2(AT=1,BT=4), P...
Q05SENIOR
Why is SJF not used directly in modern operating systems?
Q01 of 05SENIOR

Prove that SJF minimises average waiting time among non-preemptive algorithms.

ANSWER
Use exchange argument: consider two adjacent jobs A (burst t_a) and B (burst t_b) with t_a > t_b. If A runs before B, total waiting time = t_a + (t_a + t_b) = 2t_a + t_b. Swap them: B before A gives total = t_b + (t_b + t_a) = 2t_b + t_a. Difference = (2t_a + t_b) - (2t_b + t_a) = t_a - t_b > 0. Each inversion increases waiting time. Therefore any optimal schedule must have jobs in increasing burst order.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
Is SJF or SRTF used in real operating systems?
02
What is the difference between starvation and deadlock?
03
How does aging affect average waiting time?
04
Can SJF be used in real-time systems?
🔥

That's Scheduling. Mark it forged?

3 min read · try the examples if you haven't

Previous
FCFS Scheduling — First Come First Served
2 / 4 · Scheduling
Next
Round Robin Scheduling Algorithm