Idle-in-Transaction Detection — pg_stat_activity Alerting
Dead rows hit 200% of live data when idle transactions blocked autovacuum for hours.
- MySQL and PostgreSQL monitoring uses the Prometheus + Grafana stack: exporters scrape internal metrics (performance_schema, pg_stat_*) every 15s
- Core metrics to watch: Threads_running (MySQL), active queries (PG), buffer pool hit rate, VACUUM health, lock contention
- mysqld_exporter on port 9104, postgres_exporter on port 9187 — both expose /metrics endpoints for Prometheus
- Pre-built Grafana dashboards: MySQL ID 7362, PostgreSQL ID 9628 — import in 30 seconds, then customize thresholds
- Production insight: idle-in-transaction connections in PG block VACUUM and cause silent table bloat — alert on state='idle in transaction' lasting >2min
- Biggest mistake: using exporter with database superuser — create a dedicated read-only monitoring user with minimum grants
Every production application eventually hits the same wall: something goes wrong with the database at 2 AM, a customer calls to complain, and you're staring at logs trying to figure out what happened 40 minutes ago. Slow queries, connection pool exhaustion, disk space creeping toward 100% — these aren't hypothetical problems. They happen to every team, at every scale, and they're almost always preventable with the right visibility. Database monitoring isn't optional infrastructure — it's the difference between being proactive and being constantly reactive.
MySQL and PostgreSQL are the two most widely deployed open-source databases in the world, and each has its own internal metrics, query execution engine, and failure modes. A tool that works brilliantly for MySQL's InnoDB buffer pool might give you almost nothing useful for PostgreSQL's VACUUM process. Understanding not just which tools exist, but what they actually measure and why those metrics matter, is what separates engineers who can debug production incidents in minutes from those who spend hours guessing.
By the end of this article you'll know how to query the internal monitoring views of both MySQL and PostgreSQL to get real-time health data, how to set up Prometheus with the right exporters to scrape those metrics automatically, how to visualize everything in Grafana, and — most importantly — you'll understand exactly why each metric matters so you can make smart decisions when something breaks. We'll also cover the traps that catch most engineers off guard the first time they set this up.
What Your Database Is Actually Telling You — Core Metrics That Matter
Before you install a single monitoring tool, you need to speak your database's language. Both MySQL and PostgreSQL expose rich internal metrics through system views and information schema tables. The problem is there are hundreds of them, and most tutorials just dump a list without explaining which ones actually predict failure.
Think of database health in three layers: throughput (how much work is being done), latency (how long that work takes), and resource pressure (how close you are to running out of something critical). Every useful metric falls into one of these buckets.
For MySQL, the most critical real-time signals live in performance_schema and information_schema. For PostgreSQL, pg_stat_activity, pg_stat_bgwriter, and pg_locks are your best friends. The queries below show you how to pull actionable snapshots from each — not just raw numbers, but numbers with context.
One critical thing most developers miss: a single metric in isolation is almost meaningless. A query taking 500ms is fine for a reporting job, catastrophic for a login endpoint. Always correlate metrics with the workload context before raising the alarm.
PostgreSQL Deep-Dive — Querying pg_stat Views for Real Operational Insight
PostgreSQL's monitoring story is fundamentally different from MySQL's. Where MySQL centralizes most metrics in performance_schema, PostgreSQL scatters them across a family of pg_stat_* views, each focused on a specific subsystem. This design is actually more powerful once you know where to look, because each view gives you surgical precision instead of one giant table.
The three views you absolutely must understand are pg_stat_activity (who's doing what right now), pg_stat_bgwriter (how hard your background writer is working), and pg_locks (who's waiting on whom). Together they cover the three most common categories of PostgreSQL incidents: long-running queries, I/O bottlenecks, and lock contention.
There's also a uniquely PostgreSQL problem that has no MySQL equivalent: table bloat from VACUUM. PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means old row versions pile up in the table file until VACUUM cleans them. If VACUUM falls behind — because it's disabled, misconfigured, or being blocked by a long transaction — your tables can grow to 10x their logical size and queries slow to a crawl. pg_stat_user_tables exposes exactly this data.
The queries below are battle-tested snippets we'd use in a real production incident response runbook.
Prometheus + Exporters + Grafana — Building a Production Monitoring Stack
Running those SQL queries by hand is great for incident triage, but you can't watch a terminal 24/7. What you want is a system that scrapes those metrics automatically every 15 seconds, stores them historically so you can see trends, and alerts you before the disk fills up or the query latency spikes. That's exactly what the Prometheus + Grafana stack does.
The architecture is simple: Prometheus is a time-series database that periodically scrapes HTTP endpoints called 'exporters'. For MySQL there's mysqld_exporter, for PostgreSQL there's postgres_exporter. Each exporter connects to your database, runs the equivalent of the queries above, and exposes the results as a /metrics HTTP endpoint that Prometheus can consume. Grafana then connects to Prometheus and lets you build dashboards with those metrics.
What makes this stack genuinely powerful isn't the dashboards — it's Prometheus's alerting rules. You write a rule like 'if the PostgreSQL dead_row_pct for any table stays above 30% for more than 10 minutes, fire an alert'. Grafana Alerting or Prometheus Alertmanager then notifies your Slack channel or PagerDuty. You stop checking dashboards and start only seeing problems when they actually need attention.
The configuration below gets you from zero to a working stack with both exporters running.
Prometheus Alert Rules — Catching Failures Before Your Users Do
Dashboards are reactive. Alert rules are proactive. The real value of this monitoring stack isn't the pretty graphs — it's the 3 AM Slack message that says 'PostgreSQL table user_sessions is 46% dead rows — VACUUM may be blocked' before a developer even notices slowness.
Prometheus alert rules use PromQL, Prometheus's query language. They follow a simple pattern: define an expression, a threshold, a duration (how long the condition must persist before alerting), and a severity. The duration is critical — without it, you'll get flooded with noisy one-off alerts for transient spikes that self-resolve.
The rules below cover the five scenarios that account for the vast majority of MySQL and PostgreSQL production incidents: connection exhaustion, query latency spikes, replication lag, disk space pressure, and PostgreSQL VACUUM falling behind. Treat these as your starting template — tune the thresholds to match your specific workload after you've observed normal baseline values for a week.
Debugging Your Monitoring Stack — Common Failures and Fixes
Your database monitoring stack itself can break. Exporters stop responding, Prometheus loses connection, Grafana panels show 'No data'. When that happens you're flying blind — exactly when you need monitoring the most.
Most monitoring stack failures fall into three categories: configuration drift (scrape targets change but prometheus.yml doesn't), credential rotation (database passwords change but exporter DATA_SOURCE_NAME doesn't), and version mismatches (exporter updated and metrics renamed).
The quickest way to verify the whole chain: start at the exporter, then Prometheus, then Grafana. Don't jump to conclusions. A common mistake is debugging Grafana panels when the exporter itself is down.
Below are the debugging commands and patterns we use in production to restore monitoring fast.
| Aspect | MySQL (mysqld_exporter) | PostgreSQL (postgres_exporter) |
|---|---|---|
| Primary metrics source | performance_schema + information_schema | pg_stat_* system views |
| Connection metric to watch | Threads_running (not Threads_connected) | pg_stat_activity WHERE state='active' |
| Memory health indicator | InnoDB buffer pool hit rate (target 99%+) | shared_buffers cache hit rate (target 95%+) |
| Unique failure mode | Auto-increment column overflow (silent data corruption) | VACUUM falling behind (table bloat + query slowdown) |
| Replication monitoring | Seconds_Behind_Master via SHOW SLAVE STATUS | pg_replication_lag in seconds from pg_stat_replication |
| Lock detection | information_schema.INNODB_TRX + INNODB_LOCKS | pg_locks JOIN pg_stat_activity |
| Slow query history | performance_schema.events_statements_summary_by_digest | pg_stat_statements extension (must be installed) |
| Required permissions for exporter | SELECT, PROCESS, REPLICATION CLIENT | pg_monitor role (PostgreSQL 10+) or manual GRANT |
| Prometheus exporter port | 9104 (default) | 9187 (default) |
| Best free Grafana dashboard | Dashboard ID 7362 | Dashboard ID 9628 |
Key Takeaways
- Threads_running in MySQL (not Threads_connected) is the real concurrency alarm — it tells you how many queries are actually executing right now, not just connected and idle.
- PostgreSQL's 'idle in transaction' state is the silent killer — it holds locks, blocks VACUUM, and causes table bloat without showing any obviously 'stuck' queries in most monitoring setups.
- The Prometheus + mysqld_exporter/postgres_exporter + Grafana stack costs zero dollars and gives you historical trending, alerting, and dashboards that commercial tools charge thousands for monthly.
- Always create a dedicated read-only monitoring user for your exporters — never point monitoring tools at your application user or superuser, even in development, because habits formed in dev become incidents in prod.
- Alert 'for' durations are critical: too short causes alert fatigue, too long causes missed incidents. Tune each alert based on how fast the metric changes and how quickly it impacts users.
- Debug monitoring stack failures from bottom up: exporter → Prometheus → Grafana. Start at the data source, not the dashboard.
- Import community Grafana dashboards (7362 for MySQL, 9628 for PostgreSQL) — they track every metric that matters and save you weeks of development time.
Common Mistakes to Avoid
- Running the database exporter as a superuser/root
Symptom: Works fine in development, but if the exporter is compromised, the attacker has full database access. In production, security audits will flag this as a critical vulnerability.
Fix: Create a dedicated monitoring user with minimum required grants. For MySQL: GRANT SELECT, PROCESS, REPLICATION CLIENT ON . TO 'monitoring'@'localhost'. For PostgreSQL 10+: GRANT pg_monitor TO monitoring_role. The exporter only needs to READ, never write. - Setting Prometheus scrape_interval too aggressively (every 1 second)
Symptom: The exporter itself becomes a performance problem — the database shows 'monitoring overhead' in its own slow query log. Prometheus may also miss data because exporters can't keep up with the scraping rate.
Fix: Use 15s for most metrics, 60s for heavy collection like events_statements_summary. Match your scrape interval to how quickly the metric actually changes. Threads_running changes fast, table sizes change slowly. - Ignoring the 'idle in transaction' state in pg_stat_activity and only alarming on 'active' queries
Symptom: PostgreSQL VACUUM gets blocked by a session that appears idle, tables balloon with dead rows, all queries slow down gradually over days, and no alert fires because no query looks 'stuck'.
Fix: Add a separate alert specifically for connections in 'idle in transaction' state lasting more than 60 seconds. This catches the application bug (missing COMMIT/ROLLBACK) before it causes table bloat. - Using localhost instead of service name in Docker Compose network for Prometheus scrape targets
Symptom: Prometheus targets show DOWN because from inside the Prometheus container, 'localhost' points to its own container, not the exporter container.
Fix: Use the Docker Compose service name as the hostname (e.g., 'mysqld_exporter:9104' instead of 'localhost:9104'). Prometheus and exporters in the same Docker network resolve service names automatically. - Not setting up alerts for exporter health itself
Symptom: The monitoring stack goes down silently. A database failure happens two hours later and you have no historical data to debug from.
Fix: Add a Prometheus alert for 'AbsentData' or use Grafana's alerting on metric availability. Also set up a separate cron job that periodically curls the exporter endpoints and notifies if they are unreachable.
Interview Questions on This Topic
- QIf a production MySQL server suddenly shows Threads_running spiking to 80 but Threads_connected is only 95 out of a max of 500, what does that tell you and what's your first investigation step?SeniorReveal
- QA PostgreSQL table that had queries running in 10ms is now taking 2 seconds with no schema changes. You check pg_stat_activity and see nothing unusual. What other views do you check and what are you looking for?SeniorReveal
- QSomeone asks you to set up alerting for database disk space. They suggest setting a single alert at 5% free space. What's wrong with that approach and how would you design the alerting strategy instead?Mid-levelReveal
- QHow would you monitor auto-increment column overflow in MySQL? What happens when the column reaches its maximum value?Mid-levelReveal
Frequently Asked Questions
What is the best free database monitoring tool for MySQL and PostgreSQL?
The Prometheus + Grafana stack with mysqld_exporter and postgres_exporter is the industry-standard free option used by teams at every scale. It's free, open-source, runs in Docker, has pre-built community dashboards, and supports alerting via Alertmanager. For a managed option, Datadog and New Relic both have excellent database monitoring but charge per host.
How do I find which queries are slowing down my PostgreSQL database?
First, install the pg_stat_statements extension (CREATE EXTENSION pg_stat_statements) and add it to shared_preload_libraries in postgresql.conf. Then query SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20. This shows you normalized queries sorted by average execution time — the top results are your optimization targets.
What is the difference between monitoring and observability for databases?
Monitoring tracks predefined metrics — you know in advance what you want to measure (query latency, connection count, cache hit rate) and you set thresholds on those. Observability is the ability to understand why your database is behaving the way it is from any angle, even for failure modes you didn't predict. Monitoring answers 'is something wrong?', observability answers 'why is it wrong?'. In practice, the pg_stat_* views and performance_schema are your observability layer — they let you ask new questions you didn't plan for.
Should I use the same exporter for development and production?
No. Use a separate monitoring user per environment with environment-specific passwords. In development, you can use a more permissive configuration to explore metrics, but production credentials must be strictly read-only and restricted to the exporter's IP or container. Never reuse passwords across environments — if a developer's local monitoring stack is compromised, production should not be affected.
My PostgreSQL exporter shows 'pg_stat_activity_count' but not 'pg_stat_user_tables'. What's wrong?
The postgres_exporter has multiple collectors, and not all are enabled by default. You need to explicitly enable the table statistics collector. Add the command-line flag --collect.stat_user_tables to the exporter startup command. For Docker, pass it in the 'command' section of the docker-compose.yml. Alternatively, use the newer postgres_exporter v0.15+ which has automatic metric collection — check the documentation for your version.
That's MySQL & PostgreSQL. Mark it forged?
5 min read · try the examples if you haven't