Senior 9 min · March 19, 2026

AutoSys WCC Flow View — Cascade Failure Detection

A missing upstream file stalled 47 jobs at 2 AM.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • WCC is the browser-based UI for AutoSys, hosted on Tomcat (port 8080/8443).
  • Key views: Monitor > Jobs (real-time list), Flow View (dependency graph), Job Activity (timeline).
  • Every WCC action maps to a sendevent command — it's a visual wrapper, not a different control plane.
  • Performance: WCC auto-refreshes every ~30 seconds; for immediate status use autostatus or autorep.
  • Production insight: During an incident, Flow View shows blast radius in seconds; click a failed job to jump to its logs.
  • Biggest mistake: Confusing WCC's refresh lag with real-time — don't trust WCC for sub-second job status without CLI cross-check.
✦ Definition~90s read
What is AutoSys Monitoring with WCC?

AutoSys Workload Control Center (WCC) Flow View is a graphical dependency-mapping tool that visualizes job chains in real time, purpose-built for detecting cascade failures — where a single upstream job failure triggers a domino effect of downstream failures. Unlike the traditional AutoSys CLI or basic job status grids, Flow View renders job boxes connected by dependency arrows, color-coded by status (green for success, red for failure, yellow for running, gray for pending).

WCC is the cockpit for AutoSys.

This lets you spot a failing root job and instantly trace its blast radius across dependent jobs, without manually grepping autorep output or cross-referencing multiple screens. It solves the core problem of incident triage: when 50 jobs fail simultaneously, you need to know which one caused it, not just that they all failed.

WCC is a web-based UI (typically accessed via http://<wcc-server>:<port>/wcc) that replaces the older AutoSys GUI and provides several monitoring views: Monitor > Jobs View for filtering and bulk operations, Monitor > Flow View for dependency visualization, and Monitor > Job Details for logs and run history. During an incident, you'd open Flow View, filter by a job name or box, and watch the red nodes propagate — the first red job in the chain is your root cause.

Flow View also supports zoom, pan, and click-through to job details, making it the fastest way to distinguish a genuine cascade from independent failures. For day-to-day monitoring, you'd use Jobs View with saved filters for critical job streams, but during a P1 outage, Flow View is your primary weapon.

Plain-English First

WCC is the cockpit for AutoSys. Instead of running command-line queries to check job statuses, WCC gives you a live visual dashboard — a map of all your jobs, colour-coded by status, with dependency lines showing how they connect. When something goes red at 3 AM, this is where you look first.

WCC (Workload Control Center) is the browser-based monitoring and management UI for AutoSys. It's hosted on Apache Tomcat (which AutoSys installs automatically) and accessible via a standard web browser. While experienced admins often prefer the command line for speed, WCC is invaluable for understanding the state of a complex job flow at a glance — especially during incident response.

How AutoSys WCC Flow View Reveals Cascade Failures

AutoSys WCC (Web Client Console) Flow View is a dependency graph visualization that maps job-to-job relationships across a schedule. It renders the directed acyclic graph (DAG) of job chains, showing upstream triggers, downstream dependencies, and real-time status (success, failure, running, or queued). The core mechanic is that each node represents a job instance, and edges represent conditions like 'job A completes successfully before job B starts'. This lets you trace failure propagation instantly.

In practice, Flow View refreshes status every 30–60 seconds (configurable) and color-codes nodes: green for success, red for failure, yellow for running, gray for pending. When a job fails, all downstream jobs that depend on its success turn red or remain pending—this is the cascade failure signature. The view also exposes job-level details like exit codes, start times, and resource usage on hover, so you can distinguish a genuine failure from a skipped job due to upstream conditions.

Use Flow View when diagnosing production incidents involving stalled or failed job streams—especially in batch processing pipelines with hundreds of jobs. It matters because a single upstream failure can silently block dozens of downstream jobs, causing SLA breaches. Without this view, teams waste hours grepping logs or manually checking job statuses. With it, you pinpoint the root cause job in under 30 seconds.

Cascade vs. Blame
A red job in Flow View is often the victim, not the cause. Always trace upstream to the first red node—that's the actual failure.
Production Insight
A nightly ETL pipeline with 200 jobs: one upstream data-load job fails with exit code 1 (file not found). Flow View shows 47 downstream jobs all red. The team panics, restarting everything. The real fix: rerun the single data-load job after fixing the file path.
Symptom: a wall of red jobs in Flow View, all downstream of a single node.
Rule of thumb: always click the earliest red node in the DAG—that's the root cause. Restarting downstream jobs without fixing the upstream is wasted effort.
Key Takeaway
Flow View is a DAG, not a list—trace upstream to find the first failure.
Cascade failures look like many red jobs, but only one is the root cause.
Use Flow View before logs: it cuts diagnosis time from minutes to seconds.
WCC Monitoring Layers WCC Monitoring Layers. From browser to Event Server · BROWSER · WCC Web UI — job list, flow view, calendar manager · APACHE TOMCAT · WCC application server — hosts the web interface · AUTOSYS SERVER THECODEFORGE.IOWCC Monitoring LayersFrom browser to Event ServerBROWSER WCC Web UI — job list, flow view, calendar managerAPACHE TOMCAT WCC application server — hosts the web interfaceAUTOSYS SERVER Application layer — autorep / sendevent / jilEVENT SERVER Database — all job status and definitionsTHECODEFORGE.IO
thecodeforge.io
WCC Monitoring Layers
Autosys Monitoring Wcc

Accessing WCC

WCC runs on the AutoSys application server. The default URL is typically http://autosys-server:8080/wcc or https://autosys-server:8443/wcc for SSL.

Before you can access it, verify Tomcat is running. The AutoSys installation includes a bundled Tomcat server configured specifically for WCC. You'll find the startup scripts in $AUTOSYS/wcc/bin/.

If you're running WCC in a clustered environment, each application server hosts its own WCC instance. You'll need to know the hostname or load balancer VIP. Most teams bookmark the URL in their on-call browser profile — don't be the person fumbling for a bookmark at 3 AM.

wcc_access.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Default WCC URL format
# http://autosys-server-hostname:8080/wcc
# https://autosys-server-hostname:8443/wcc  (SSL)

# Check if WCC (Tomcat) is running on the server
ps -ef | grep -i tomcat

# Check the Tomcat port is listening
ss -tlnp | grep 8080
ss -tlnp | grep 8443

# If WCC is down, restart Tomcat
$AUTOSYS/wcc/bin/shutdown.sh
$AUTOSYS/wcc/bin/startup.sh
WCC doesn't need to be on the AutoSys server
WCC is a web application — it can be accessed from any machine with network access to the server. Most operations teams access it from their laptops or monitoring workstations rather than SSH'ing into the server.
Production Insight
If WCC is unresponsive but autorep works, Tomcat may be out of memory or disk space. Check $AUTOSYS/wcc/logs/catalina.out for OutOfMemory errors.
Rule: always have the CLI fallback ready — WCC is a convenience, not a requirement.
Key Takeaway
WCC is a Tomcat web app on the AutoSys server.
Access it from any browser on the same network.
If it's down, manage jobs via command-line tools — you never need WCC to run batch.

Key WCC views for day-to-day monitoring

Monitor > Jobs: A real-time list of all jobs with their current status, colour-coded. Filterable by status, machine, owner, or name pattern. This is your primary monitoring view.

Monitor > Flow View: Shows jobs as nodes in a dependency graph with connecting arrows. When a job fails, you can immediately see which downstream jobs are blocked. Essential for understanding impact during incidents.

Monitor > Job Activity: A timeline view showing job start and end times. Useful for spotting jobs that are running longer than usual.

Administration > Calendars: View and manage AutoSys calendars.

Administration > Machines: View agent machine status — which are ACTIVE, MISSING, or INACTIVE.

Production Insight
Most teams rely solely on Monitor > Jobs, but that view hides dependencies. Flow View reveals the blast radius in seconds.
Rule: during any incident, always open Flow View first.
Don't trust Monitor > Jobs alone — it doesn't show parent-child relationships.
Key Takeaway
WCC has four critical views: Jobs, Flow, Activity, and Admin.
Flow View is the only one that shows dependency chains — use it for incident root cause analysis.
Job Activity is for performance trend spotting, not real-time monitoring.

Using WCC during an incident

When the on-call team gets paged at 3 AM because a batch run is failing, WCC is where you start. The flow of an incident investigation:

  1. Open WCC and go to Monitor > Jobs. Filter by status = FAILURE to see all failed jobs.
  2. Click on the first failed job — it's likely the root cause if multiple jobs failed.
  3. Switch to Flow View to see the dependency chain. This shows you exactly which downstream jobs are blocked.
  4. Check the failed job's logs by clicking 'View Log' or checking its std_out_file attribute.
  5. After fixing the root cause, Force Start the failed job via right-click → Force Start.
  6. Monitor the downstream jobs as they cascade from ACTIVATED to RUNNING to SUCCESS.

Pro tip: In large environments, use the filter to show only jobs that are ON HOLD or FAILURE — this reduces noise and highlights the critical path.

incident_workflow.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Step 1: Check WCC Monitor > Jobs, filter by status = FAILURE
# Visual: red jobs immediately visible

# Parallel: command-line equivalent
autorep -J % -s FA

# Step 2: Click the failed job in WCC to see its attributes and log paths
# Command-line equivalent:
autorep -J failed_job -d
cat /logs/autosys/failed_job.err

# Step 3: Use Flow View to see which downstream jobs are blocked
# Command-line equivalent:
autorep -J % | grep -E 'AC|IN' | head -20  # show ACTIVATED/INACTIVE jobs

# Step 4: After fixing, restart via WCC (right-click > Force Start)
# Command-line equivalent:
sendevent -E FORCE_STARTJOB -J failed_job
WCC actions and command-line actions are equivalent
Every action you can take in WCC (force start, hold, kill, release) is the same as running the corresponding sendevent command. WCC is just a visual wrapper. Some teams restrict WCC access and require command-line only for audit trail purposes.
Production Insight
During an incident, every second counts. WCC's refresh cycle adds 5-30 seconds of delay. If you need immediate status after a Force Start, verify with autostatus -J job_name.
Rule: never trust WCC's display alone after an action — always cross-check with CLI.
The biggest time sink during incidents is chasing the wrong failed job. Flow View eliminates that.
Key Takeaway
Incident flow: filter FAILURE jobs → click first red job → Flow View to see blast radius → fix and Force Start.
Always cross-check WCC status with CLI commands after actions.
The first failed job in the chain is the root cause — don't fix downstream symptoms.

Understanding Monitor > Jobs View

The Monitor > Jobs view is your day-to-day dashboard. It displays a table of all jobs with columns: Job Name, Status, Last Start Time, Last End Time, Machine, Owner, and Exit Code. You can sort by any column and filter by status, machine, owner, or job name pattern.

Color coding: SUCCESS (green), FAILURE (red), RUNNING (blue), ACTIVATED (yellow), ON HOLD (grey), INACTIVE (white). This makes status scanning fast.

You can right-click any job to perform actions: Force Start, Hold, Rerun, Kill, Release. The actions are equivalent to sendevent commands.

One overlooked feature: the 'View Log' popup shows the last N lines of the job's stdout and stderr. However, if the log is on the agent and the agent is unreachable, this won't work. Always have direct SSH access as a fallback.

Pro tip: Save a custom filter for jobs on your team's critical path. For example, filter by job name pattern PAYROLL% and status != SUCCESS to quickly spot non-green jobs in the nightly batch.

Filter by Owner to see your team's jobs
If multiple teams share the same AutoSys instance, use the Owner filter to see only your team's jobs. This reduces noise when monitoring during off-hours.
Production Insight
When a job fails, the exit code matters. Exit code 0 = success, non-zero = failure. But some jobs exit with code 0 and still fail logically (e.g., a script that exits 0 after writing an error to stdout). Always check the log.
Rule: don't just rely on the colour — click the job and read the log.
WCC's log viewer is convenient but slow for large logs; use tail -100 via SSH instead.
Key Takeaway
Monitor > Jobs is the real-time list with colour-coded statuses.
Filter by status, machine, owner, or name pattern.
Always verify failures by reading the actual log — exit code alone is not enough.

Monitor > Flow View: Managing Dependencies Visually

Flow View is WCC's most powerful feature for incident response. It displays jobs as nodes in a directed graph, with arrows representing condition dependencies (SUCCESS, FAILURE, EXIT code, etc.). You can zoom in/out, pan, and click any node to see its details.

When a job fails, its node turns red. All downstream jobs that depend on it are highlighted as blocked (ON HOLD or ACTIVATED with greyed-out arrows). This visual representation immediately answers the question: 'What is affected?'

You can also edit dependencies from Flow View? No — WCC is read-only for dependencies. You must use JIL to change conditions. However, you can view the dependency details by clicking a job and looking at the 'Depends' tab.

Flow View also supports 'Trace' mode: click a job and select 'Trace Upstream' — this highlights all jobs that must succeed before this one can run. Useful for debugging a stuck ACTIVATED job.

Pro tip: In large flows with hundreds of jobs, use the Filter in Flow View to show only jobs with status = FAILURE or ON HOLD. This declutters the graph and highlights the problem chain.

flow_view_tips.shBASH
1
2
3
4
5
6
7
8
9
10
11
# Find jobs blocked by a specific failed job
autorep -J downstream_job -d | grep -i condition

# Show all jobs that depend on a given job (reverse dependency)
autorep -J % -q | grep "upstream_job_name"

# View the full dependency tree recursively (bash snippet)
autorep -J % -q | awk -F' ' '{print $1}' | while read job; do
  echo "--- $job depends on ---" 
  autorep -J $job -d 2>/dev/null | grep -i "condition"
done
Think of Flow View as the wiring diagram for your batch
  • Each job = a component in the circuit.
  • Arrows = wires carrying voltage (conditions).
  • Failed job = blown fuse. Red node = fuse blown.
  • Downstream jobs = components waiting for power. They won't run until the upstream fuse is replaced.
  • WCC Flow View is the circuit schematic — trace the failure path from the blown fuse.
Production Insight
In large batch environments (1000+ jobs), Flow View can be slow to load, especially on first render during an incident. If it hangs, use the command-line method to trace dependencies: autorep -J job_name -d.
Rule: for critical chains, pre-generate a static dependency map using JIL exports.
WCC Flow View is the best diagnostic tool, but don't let a slow browser delay incident response — have a CLI fallback script ready.
Key Takeaway
Flow View shows jobs as nodes with dependency arrows.
Trace Upstream helps find why a job is stuck.
Pre-generated dependency maps prevent WCC slowdown during crises.

Administering Machines and Calendars

The Administration views in WCC let you manage two critical aspects of AutoSys: machines and calendars.

Administration > Machines: Shows a list of all machines connected to the AutoSys application server. Each machine has a status: ACTIVE (agent is connected and healthy), MISSING (agent not responding after 20 minutes), INACTIVE (agent was deactivated). If a machine is MISSING, jobs assigned to that machine will stay in ACTIVATED state waiting for an available agent. You can right-click a machine to toggle its status or view properties.

Administration > Calendars: View and manage calendar definitions. Calendars control which days jobs run. You can create, edit, delete calendars. Changes take effect immediately — no restart needed. Pro tip: Never delete a calendar that's still referenced by active jobs. WCC will warn you, but double-check with autocal_asc first.

Limitation: WCC does not allow you to add or remove machines — only view and change their status. Machine registration is done via JIL or the AutoSys admin utility.

Pro tip: Use the Machine view during an incident to verify that the agent on the critical machine is ACTIVE. A MISSING agent is a common root cause of stuck jobs.

Machines and agents are the same
In AutoSys, a 'machine' refers to an agent that runs jobs. Each agent registers with the application server. If the agent service stops, the machine status goes MISSING after 20 minutes of no heartbeat.
Production Insight
A MISSING machine status is often caused by a stopped agent service or network issue. But sometimes it's because the agent's disk is full — the agent can't write heartbeats. Always check disk space on the agent:
df -h.
Rule: monitor machine status proactively — a MISSING machine can silently delay hundreds of jobs until someone notices.
Calendar changes are immediate, but no audit log exists in WCC. If someone deletes a calendar by mistake, recovery requires recreating it from backups.
Key Takeaway
Admin > Machines shows agent status: ACTIVE, MISSING, INACTIVE.
MISSING agents block jobs assigned to them — fix the agent or reassign jobs.
Calendars control job schedules; changes are instant but not audited via WCC.

Troubleshooting WCC Connectivity and Performance

WCC can be temperamental. Here are common issues and what to check:

WCC won't load: Check Tomcat process with ps -ef | grep tomcat. If Tomcat is running, check if the port is listening: ss -tlnp | grep 8080. If Tomcat is not running, start it: $AUTOSYS/wcc/bin/startup.sh. If it crashes immediately, check catalina.out for OutOfMemory errors or port conflicts.

Slow page loads: WCC performance degrades when the database (where AutoSys stores job data) is under load. Also, a large number of jobs (10k+) can slow down the initial page load. Tip: Use filters to reduce the data returned. For example, filter by a specific machine or job name pattern before loading.

Session timeout: WCC has a configurable session timeout (default 30 minutes). If you step away, you'll be logged out. The session is in Tomcat's web.xml. You can extend it, but that's a security trade-off.

Stale data across multiple tabs: If you have multiple WCC tabs open, each may have its own session. They don't auto-sync. Closing and reopening WCC is the quick fix.

WCC vs CLI conflict: Sometimes WCC shows a job as RUNNING but the CLI shows it as SUCCESS. This is a browser cache issue. Press Ctrl+F5 or clear the browser cache.

Log viewer fails: The 'View Log' button in WCC tries to read from the agent's log location. If the agent is unreachable (MISSING), it will fail. Use SSH to the agent directly.

wcc_troubleshoot.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Check Tomcat process
ps -ef | grep tomcat

# Check if port is listening
ss -tlnp | grep -E '8080|8443'

# Check catalina logs for errors
tail -200 $AUTOSYS/wcc/logs/catalina.out

# Restart Tomcat
$AUTOSYS/wcc/bin/shutdown.sh
$AUTOSYS/wcc/bin/startup.sh

# If restart fails, check disk space
# Tomcat needs free space for logs and temp files
df -h

# If WCC is still down, continue operations via CLI
autorep -J %
sendevent -E FORCE_STARTJOB -J my_job
Do not restart Tomcat during active batch runs unless necessary
Restarting Tomcat only affects WCC, not job scheduling. sendevent and batch processing continue independently. However, if an operator is in the middle of an action in WCC (like a Force Start), it will be lost. Use CLI for critical actions during restart.
Production Insight
The most common cause of WCC downtime is a full disk on the application server. Tomcat writes logs aggressively. Set up log rotation or a disk usage alert.
Rule: alerts on disk space are more important than alerts on WCC availability — WCC going down is often a symptom of disk pressure.
Always have a quick-reference card for CLI commands taped to your monitor (or bookmarked). When WCC is down, you can't afford to search for commands.
Key Takeaway
WCC issues are usually Tomcat, database, or disk related.
Restart Tomcat only when necessary — batch operations are unaffected.
Keep CLI commands at hand — WCC is not required to run or monitor jobs.

Why Your WCC Jobs Are Silent-Failing: The JIL Validation Gap

You've seen it. A job shows 'SUCCESS' in WCC, but the data it was supposed to produce never landed. That's not a bug in AutoSys — it's a gap in how you define success in JIL. The exit code of a shell script is not the same as business success.

WCC only reports what AutoSys knows: the process exit code. If your script prints 'All good' after crashing on line 47, but still returns exit 0, WCC paints that green square and calls it a day. You're flying blind.

The fix: instrument your JIL definitions with a condition check on the actual outcome, not just the exit code. Append a post-processing job that validates the artifact — file existence, row count, API response code. Then condition that downstream job against 'SUCCESS' of the validator, not the primary executor.

Map your critical indicators in WCC as custom 'reportable' attributes in the job definition. Pull them into a Monitor view column. You will spot a silent failure before the business calls you.

jil_validation_gap.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — devops tutorial
// Validate job output before trusting WCC status

insert_job: data_extract
job_type: cmd
command: /opt/scripts/extract.sh
every: 5
condition: s(ingestion_done)
term_run_time: 60

// Validator job - only this feeds downstream
insert_job: validate_extract
job_type: cmd
command: /opt/scripts/validate_output.sh /data/extract/output.csv
condition: s(data_extract)
term_run_time: 10

// Downstream waits on validator, not extractor
insert_job: load_to_dwh
job_type: cmd
command: /opt/scripts/load.sh
condition: s(validate_extract)
term_run_time: 120
Output
Job data_extract submitted.
Job validate_extract submitted (waiting on data_extract).
Job load_to_dwh submitted (waiting on validate_extract).
Production Trap:
Stop treating exit code 0 as proof of success. Add a validator job for every critical pipeline. Your WCC 'SUCCESS' blocks are lying to you.
Key Takeaway
If your validator doesn't check what the job actually produced, you're monitoring the wrong thing.

The Machine-Level View WCC Hides: Why Your Boxes Stall

Hot take: WCC's 'Monitor > Jobs' view is a lie. It shows you job status on a machine, but it will never tell you why the machine itself is the bottleneck. I've spent three hours tracing a box that kept showing 'INACTIVE' while all its children were 'RUNNING' — turns out the agent had dropped a heartbeat, but WCC still showed the machine as 'ON'.

WCC polls agents via TCP/7600. If the agent responds with a stale status (e.g., from a zombie process holding the lock), WCC trusts it. You see a cascade of 'FAILURE' on downstream jobs and assume the job broke — but really, the agent needed a restart yesterday.

The answer: build a separate heartbeat job that pokes the agent every 2 minutes and writes to a watchdog box. Then condition your real work on that box being 'SUCCESS'. If the box goes 'FAILURE', your entire pipeline stops hard — and you know it's infrastructure, not code.

Don't rely on the WCC machine dashboard alone. Trust, but verify — with a job.

agent_heartbeat_watchdog.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — devops tutorial
// Machine-level health via job not WCC dashboard

insert_job: agent_heartbeat_box
job_type: box
date_conditions: 1

// Runs every 2 minutes, checks agent responsiveness
insert_job: heartbeat_app_01
job_type: cmd
command: /opt/autosys/bin/autorun %s
box_name: agent_heartbeat_box
run_calendar: always
start_times: "00:00"
interval: 00:02:00
alarm_if_fail: 1

// Downstream work depends on heartbeat box
insert_job: production_pipeline
job_type: box
condition: s(agent_heartbeat_box)
Output
2024-05-15 10:32:01 INFO agent_heartbeat_box activated
2024-05-15 10:32:02 INFO heartbeat_app_01 STARTING
2024-05-15 10:32:15 INFO heartbeat_app_01 SUCCESS
2024-05-15 10:34:00 INFO production_pipeline condition met - starting
Senior Shortcut:
Your agent isn't healthy just because WCC says so. Run a 'pingagent' job every 2 minutes in a watchdog box and condition your whole flow on it. One box fail, everything stops — you isolate the root cause instantly.
Key Takeaway
Never trust the WCC machine dashboard alone. A watchdog job is the only honest health check.

The Hidden Value of WCC's History View: Why You're Debugging Blind

When a job fails after a week of clean runs, most devs open the job definition, tweak something random, and rerun. That's cargo-cult debugging. You need to see what changed — in the environment, not the code. WCC's History view (Administer > Reports > Job History) is the tool nobody uses properly.

Export the history for that job over the past 30 days. Look for patterns: the job consistently runs 3 seconds on success, but 11 seconds on the last run before failure. That's a resource leak. Or the job finishes at 2:01 AM EXACTLY every day, but yesterday it drifted to 2:14 AM — and then at 2:15 AM a calendar-based batch job kicked in and stole CPU.

The History view exposes shift patterns in run times, exit codes, and resource contention windows. Sort by duration, filter by exit code != 0, then look at the preceding run's duration. A sudden slowdown always precedes a failure. That's your debugging window.

Apply this: set up a WCC report that emails you any job whose runtime exceeds 3x its moving average over 7 runs. Catch the slowdown, fix the resource issue, and watch your mean time to resolution drop.

history_anomaly_report.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — devops tutorial
// WCC report for jobs exceeding 3x moving average runtime

insert_job: anomaly_detection_report
job_type: cmd
command: /opt/scripts/wcc_anomaly_report.sh
  --window 7
  --threshold 3.0
every: 10
max_run_alarm: 1

// The script queries WCC history API, filters, and alerts
// Sample output:
Output
ALERT: Job 'load_inventory' runtime 45s (avg 12s, ratio 3.75)
Previous 7 runs: 11, 13, 12, 10, 11, 13, 12
Check for: file size growth, db lock contention, memory pressure
Senior Shortcut:
Don't wait for a fail. Track runtime moving averages in WCC History. A job that doubles its runtime is a ticking bomb. Catch it before it detonates.
Key Takeaway
Sudden runtime drift is the canary in the coal mine. Monitor it, or get woken up at 3 AM.
● Production incidentPOST-MORTEMseverity: high

The Night a Downstream Job Took Down the Entire Batch Run

Symptom
Batch run stalled around 2 AM. Multiple downstream jobs showing ON HOLD or ACTIVATED with no error. The team knew something was wrong but couldn't see the full picture.
Assumption
They assumed the database was slow, so they started restarting database connections. That didn't fix it.
Root cause
A file watcher job that verified an upstream file had failed because the file was missing. That failure cascaded, but because the team was looking at individual job statuses instead of the Flow View, they couldn't see the parent-child relationships.
Fix
Opened WCC, switched to Flow View, zoomed out to see the entire dependency graph. One red node (the file watcher) with a chain of blocked downstream jobs. Manually forced the file watcher (after verifying the file was uploaded), then released all blocked jobs via sendevent.
Key lesson
  • Always start incident response with Flow View — it shows the blast radius in one glance.
  • Never trust a single job status in isolation; always check upstream dependencies.
  • WCC's Flow View is the fastest way to find the root cause in a multi-job chain.
Production debug guideSymptom → Action: Use WCC as your first diagnostic tool4 entries
Symptom · 01
Job stuck in ACTIVATED status but not running
Fix
Check Flow View for a red upstream ancestor. Use Filter to show ON HOLD/FAILURE ancestors. The problem is almost always a dependency not met.
Symptom · 02
Job failed with exit code but no error log
Fix
Click the job in Monitor > Jobs, then 'View Log' or check the job's std_out_file/std_err_file. If WCC doesn't show the log, SSH to the agent and check the file directly.
Symptom · 03
Multiple jobs suddenly ON HOLD or INACTIVE with no obvious cause
Fix
Open Flow View, select 'Show Dependencies', and look for a single red node near the top of the chain. That's the root cause.
Symptom · 04
WCC shows job status as RUNNING but it's been running for 4+ hours
Fix
Use autorep -J job_name -q to get the exact start time and pid. If the pid is dead, it's a zombie — kill via sendevent -E FORCE_STARTJOB. Then investigate why it hung.
★ Quick WCC Debug Cheat SheetWhen you're paged at 3 AM, these are the first three things to do in WCC.
Batch run stalled — no visible progress
Immediate action
Open WCC → Monitor > Flow View. Zoom out to see the full graph. Look for red nodes.
Commands
In Filter field, select status = FAILURE. This highlights all failed jobs instantly.
Click on the first red node in the chain (top of the failure cascade) to view its log path.
Fix now
If it's a file watcher that failed due to missing file, verify the file exists, then right-click → Force Start. Otherwise, right-click → Rerun Job.
Job is stuck in ACTIVATED status+
Immediate action
Select the job in Monitor > Jobs. Click the 'Depends' tab to see its conditions.
Commands
Run autorep -J job_name -d to see the exact dependency condition and current status of upstream jobs.
If an upstream job is ON HOLD, release it via right-click → Release. If it's FAILED, you need to force start it first.
Fix now
Force start the upstream job after resolving the root cause, then release this job.
WCC not loading or returning errors+
Immediate action
SSH to the AutoSys server and check Tomcat process: ps -ef | grep tomcat
Commands
Check port: ss -tlnp | grep 8080 (or 8443 for HTTPS)
If Tomcat is running but WCC is unresponsive, restart it: $AUTOSYS/wcc/bin/shutdown.sh && $AUTOSYS/wcc/bin/startup.sh
Fix now
If WCC is completely down, use command-line: autorep for status, sendevent for job control, jil for job definitions. WCC is not required for operations.
WCC Views vs CLI Equivalents
WCC ViewWhat it showsCLI equivalent
Monitor > JobsAll jobs with real-time statusautorep -J %
Monitor > Flow ViewVisual dependency graphNo direct equivalent (use autorep -J % -d to see conditions)
Monitor > Job ActivityTimeline of job runsautorep -J % -run 1
Administration > MachinesAgent machine statusesautorep -M %
Administration > CalendarsCalendar definitionsautocal_asc -r calendar_name

Key takeaways

1
WCC (Workload Control Center) is the browser-based monitoring dashboard for AutoSys, hosted on Tomcat
2
Flow View shows jobs as a visual dependency graph
essential for understanding blast radius of a failure
3
Every WCC action is equivalent to a sendevent command
WCC is a visual wrapper, not a different system
4
If WCC is down, all monitoring and operations can continue via command-line tools (autorep, sendevent, autostatus)
5
WCC's auto-refresh has 5-30 second latency; always cross-check critical actions with CLI
6
Use filters aggressively in WCC to reduce noise and focus on the failure chain during incidents

Common mistakes to avoid

4 patterns
×

Not bookmarking WCC in your on-call browser

Symptom
When paged at 3 AM, you waste precious minutes searching for the URL or asking teammates. Stress increases.
Fix
Add WCC URL as a bookmark with a label like 'AutoSys WCC'. Keep it in a dedicated on-call browser profile or a shared team wiki.
×

Using WCC for bulk operations

Symptom
WCC becomes sluggish when you need to force start 50 jobs. Clicking each job individually takes too long.
Fix
Use CLI loops: for job in list; do sendevent -E FORCE_STARTJOB -J $job; done. WCC is for visibility, not bulk actions.
×

Confusing WCC's refresh cycle with real-time

Symptom
After a Force Start in WCC, you see it still as FAILURE because the refresh hasn't happened yet. You may double-execute.
Fix
WCC auto-refreshes every ~30 seconds. For immediate status, use autostatus -J job_name or autorep -J job_name -q.
×

Leaving WCC sessions open across multiple tabs

Symptom
One tab shows a job as RUNNING, another shows SUCCESS. Conflicting data causes incorrect decisions.
Fix
Work with a single WCC tab during an incident. If you need multiple views, open them as separate browser windows (not tabs) — each window gets its own session.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is WCC in AutoSys?
Q02JUNIOR
What is the Flow View in WCC and when would you use it?
Q03JUNIOR
How do you access WCC?
Q04SENIOR
Can you perform all AutoSys operations from WCC or only some?
Q05SENIOR
What do you do if WCC is down but you still need to monitor and manage j...
Q01 of 05JUNIOR

What is WCC in AutoSys?

ANSWER
WCC (Workload Control Center) is the browser-based UI for AutoSys. It provides real-time job monitoring, visual dependency flow views, calendar management, and machine status. It runs on Apache Tomcat, which AutoSys installs automatically. WCC is a convenience layer — all operations can be performed via CLI.
FAQ · 7 QUESTIONS

Frequently Asked Questions

01
What is WCC in AutoSys?
02
What port does WCC use?
03
What is the Flow View in WCC?
04
Can I manage AutoSys without WCC?
05
What should I check if WCC is not loading?
06
Does WCC work during an Active-Active cluster?
07
How do I extend the WCC session timeout?
COMPLETE GUIDE
The Complete AutoSys Workload Automation Guide for Engineers →

JIL syntax, sendevent, autorep, box jobs, file watchers, scheduling, HA, security, cloud workload automation, and 22 interview questions — the definitive AutoSys reference for production engineers.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's AutoSys. Mark it forged?

9 min read · try the examples if you haven't

Previous
Force Start and Kill Job in AutoSys
23 / 30 · AutoSys
Next
AutoSys Alarms and Notifications