Time Series DB — Cardinality Explosion Crashed Monitoring
Prometheus 30s timeouts, InfluxDB OOM every 4h due to 1.
- Time series databases store timestamped measurements, optimized for high write throughput and time-range queries
- Storage engine decision: LSM-Tree for write-heavy (InfluxDB), B-Tree for mixed workloads (TimescaleDB), or custom append-only (Prometheus)
- Compression: delta-of-delta timestamps, XOR floating point, bit-packing — cuts storage 90%+
- Cardinality (number of unique metric series) is the #1 performance killer — keep under 100k per shard
- Retention policies define data lifecycle; downsampling preserves long-term trends while shedding raw data
- Production mistake: treating TSDB like a relational DB leads to slow queries and disk bloat within weeks
Imagine your fitness tracker records your heart rate every second — not just what it is right now, but a permanent, ordered log of every reading since you put it on. A time series database is like that log, but engineered specifically so you can ask questions like 'what was my average heart rate between 2pm and 3pm last Tuesday?' in milliseconds, even if there are billions of readings. It's a regular database's specialised cousin that obsesses over timestamps and makes time-based queries absurdly fast.
Every production system that matters emits a continuous stream of measurements — CPU usage spikes at 3am, a payment gateway latency ticks up by 12ms, a wind turbine's RPM drops before a bearing fails. These aren't events you look up by ID; they're facts anchored in time, and the questions you need to ask are always temporal: trends, anomalies, rolling averages, rate-of-change. General-purpose relational databases weren't built for this workload, and trying to make them do it at scale is one of the fastest ways to watch a perfectly good PostgreSQL instance beg for mercy.
Time series databases (TSDBs) exist because the write pattern, query pattern, and retention lifecycle of timestamped data are fundamentally different from transactional data. You almost never update a past measurement — the past is immutable. Writes are high-throughput and append-only. Queries almost always involve a time range plus aggregation. And data has a natural decay in value: last second's CPU reading matters more than last year's. A TSDB exploits all of these properties at every layer of its architecture — from how bytes land on disk to how queries are planned.
By the end of this article you'll understand exactly how TSDBs store and compress data internally, why cardinality is the silent killer of TSDB performance, how to make a principled choice between InfluxDB, TimescaleDB, and Prometheus for a given system design, and what production mistakes cost teams weeks of firefighting. This is the article you wish existed the first time a monitoring stack fell over at 2am.
What Makes a Database 'Time Series'?
A time series database is any storage system optimized for append-only writes of timestamped measurements. Unlike a traditional row store, a TSDB treats time as the primary sort key — data is always written in time order and queried by time range. This changes everything.
Relational databases organize data by entity (user, order, product). Time series databases organize by time first. That means writes go to the latest time chunk (hot shard), and compactions merge older chunks into read-optimised blocks. You never update a past value — the only operations are insert (new point) and delete (retention).
The query pattern is also different. You ask for 'average CPU over the last hour grouped by 5-minute windows' — not 'CPU value for server X at timestamp Y'. That's a range scan with aggregation, which TSDBs accelerate with time-based partitioning and pre-aggregated rollups.
- Data is always ordered by time — no random inserts.
- Past data is immutable: updates are inserts with a later timestamp.
- Queries are range scans with aggregation; point lookups are rare.
- Hot shard vs cold shard: recent data in memory, older data compressed on disk.
Storage Engine Internals: How TSDBs Organise Data on Disk
The storage engine determines write throughput, compression ratio, and query speed. Two dominant approaches exist: Log-Structured Merge (LSM) trees and custom B-Tree adaptations.
LSM Trees (InfluxDB, VictoriaMetrics, TimescaleDB hybrid)
Writes first land in a Write-Ahead Log (WAL), then are buffered in memory in a sorted data structure (memtable). When the memtable reaches a threshold (e.g., 64MB), it's flushed to disk as an immutable SSTable file. Background compaction merges multiple SSTables into larger ones, discarding tombstones and dead versions. This design gives excellent write throughput because every write is sequential. Reads require checking multiple layers (memory + several SSTables per level). Bloom filters accelerate point lookups, but range scans still need to merge overlapping SSTables.
B-Tree with Time-Ordered Primary Key (TimescaleDB, hybrid)
TimescaleDB uses PostgreSQL's B-Tree where the primary key is (time, optional hash) in a chunked table structure. Writes are still sequential-ish because time is monotonically increasing. It supports full SQL, joins, and relational flexibility. Write throughput is lower than LSM for pure time series, but query performance on small ranges is better due to efficient index lookups. B-Trees also have no write amplification from compaction — but they do have page splits and WAL overhead.
Custom Append-Only (Prometheus, M3DB)
Prometheus stores each time series in a block per 2-hour interval. Data is stored as delta-of-delta compressed chunks. The blocks are read-only once completed. This gives extremely fast read access to recent data but requires careful planning for long-term retention (usually offloaded to Thanos or Cortex).
Compression Techniques: How TSDBs Fit Petabytes into Gigabytes
Time series data is highly compressible because adjacent points are correlated — CPU utilization doesn't jump from 5% to 95% in a millisecond. TSDBs exploit this with specialized compression schemes.
Timestamp Compression: Delta-of-Delta
Instead of storing full 64-bit timestamps, TSDBs store the difference from the previous timestamp. Most timestamps arrive at a fixed interval (e.g., every 10s). The delta-of-delta is often zero or small. Prometheus uses this: for a batch of 122 points, it stores the first timestamp fully, then the first delta (usually the interval), then delta-of-delta values. When the interval is constant, delta-of-delta is zero and can be encoded in a single bit. This compresses 64-bit timestamps to an average of ~2 bits.
Value Compression: XOR (Gorilla)
Floating point values change slowly. Gorilla compression (originally from Facebook's Gorilla TSDB, now used in Prometheus) XORs consecutive values. If most bits are the same, the result has many leading zeros, which can be encoded with a variable-length scheme. For 64-bit doubles, this achieves ~1.4 bits per point on average for normal metrics.
Columnar Compression
InfluxDB stores each field in separate columns (similar to Parquet). Within a column, values are sorted by time first, so run-length encoding (RLE) and delta compression work well. TimescaleDB's compression uses native PostgreSQL compression types (ZSTD, LZ4) on chunks, achieving 5–20x reduction.
Real-world results: a 7-day retention of 10k series at 10s resolution (~60 million points) occupies ~12GB uncompressed, ~400MB compressed.
Retention Policies and Downsampling — Managing Storage Over Time
Time series data has diminishing value over time. The exact CPU reading from 3 months ago isn't useful — but the hourly average is. TSDBs manage this through retention policies and downsampling.
Retention Policies (RP)
InfluxDB's RP defines how long data is kept and how many copies exist. For example, 'autogen' RP with DURATION 7d, REPLICATION 1. Once data exceeds the duration, it's automatically deleted. TimescaleDB uses chunk-based retention: you can drop chunks older than a threshold using or automate with a background job. Prometheus stores data in blocks per retention period; blocks older than drop_chunks()--storage.tsdb.retention.time are deleted.
Downsampling / Continuous Queries
To keep long-term historical data without blowing your storage budget, you create rollups — e.g., store raw data at 10s resolution for 7 days, then 1-minute averages for 30 days, then 1-hour averages for 1 year. InfluxDB calls this Continuous Queries (CQ). TimescaleDB uses and continuous aggregates. Prometheus recording rules can precompute rollups.time_bucket()
Shard Groups and Time-Based Partitioning
Data is written to shard groups that correspond to time ranges (e.g., 7-day shards). Each shard is stored as a set of files (TSM files in InfluxDB, chunks in Prometheus). When a shard is closed (no longer receiving writes), it can be fully compressed and moved to cold storage or even archived to object storage (e.g., S3) using tools like Thanos or InfluxDB Cloud's tiered storage.
- Hot tier: raw data in memory (last 6h) — fastest queries, expensive storage.
- Warm tier: compressed data on SSD (last 7 days) — good performance, moderate cost.
- Cold tier: downsampled rollups on HDD/object storage (months/years) — cheap, slower queries.
Cardinality: The Silent Performance Killer
Cardinality is the number of unique time series in your TSDB. Each combination of metric name + label key/value pair creates a distinct series. For Prometheus: http_requests_total{method="GET", endpoint="/api", status="200"} is one series. Add a label user_uuid with 10,000 different values, and you get 10,000 series — for just this one metric.
Why cardinality matters
- Index memory: TSDBs keep an in-memory index of all series. High cardinality consumes RAM and slows down writes because every new label combination must be added to the index (usually requires a lock or CAS operation).
- Query performance: Queries that scan many series (e.g., 'sum of all requests') must iterate over thousands of series. Even with aggregation, the planner needs to evaluate each series.
- Compaction overhead: More series means more SSTable files (in LSM) or more blocks (Prometheus). Compaction merges many small files, increasing CPU and I/O.
Safe cardinality thresholds
- Prometheus: keep total series under 500k per server (for 8GB RAM). Use --storage.tsdb.max-series to enforce a hard limit.
- InfluxDB: keep series per measurement under 100k. Use max-series-per-database config.
- TimescaleDB: less sensitive because indexing is disk-based (B-Tree), but high cardinality still hurts query performance. Keep distinct tag values under 1,000 per dimension.
How cardinality explosions happen
Common culprits: Including request IDs, timestamps, or random values as labels. For example, adding a 'container_id' label in a Kubernetes cluster with 500 pods creates 500 series per metric. Add 'pod_name' with unique names and it multiplies. Multiply by every redeployment (new container IDs), and you get unbounded growth.
Choosing the Right TSDB: InfluxDB, TimescaleDB, or Prometheus
You don't start with 'which TSDB is best?' You start with 'what's my workload?' The answer hinges on three things: write throughput, query complexity, and ecosystem.
InfluxDB - Best for: High-volume IoT metrics, application monitoring, sensor data. - Engine: LSM-based (TSM over RocksDB). Strong compression, high write throughput (millions of points/sec on a single node). - Query: InfluxQL or Flux (deprecating Flux in 3.x). No SQL, limited joins. - Retention: Native RP and CQ. Downsampling built-in. - Scalability: Single-node OSS up to ~1M series. Enterprise/Cloud for clustering. No built-in federation.
TimescaleDB - Best for: Hybrid workloads — time series that occasionally need SQL joins with relational data (e.g., financial trades, sensor + metadata). - Engine: PostgreSQL with automatic partitioning (hypertable). Full SQL support, JSON, JOINs, window functions. - Query: PostgreSQL SQL. Rich ecosystem of tools. - Retention: Via chunk dropping or timescaledb_expire chunk job. - Scalability: Single-node with streaming replication for HA. Distributed version (timescaledb scale) since 2.13.
Prometheus - Best for: Cloud-native monitoring, Kubernetes, alerting tied to metrics. - Engine: Custom block-based storage. Pull model (or push via Pushgateway). - Query: PromQL — powerful, but different from SQL. Excellent for alerting rules. - Retention: Block-based with configurable retention. Long term via Thanos/Cortex. - Scalability: Single-server limited to ~1M series. Use Thanos for aggregation across clusters.
When to avoid each - Avoid InfluxDB if you need SQL joins or complex relational queries. - Avoid TimescaleDB if your write throughput exceeds 500k points/sec on a single node (use LSM alternatives). - Avoid Prometheus if you need to push metrics from many sources without a central coordinator (use Pushgateway carefully).
Cardinality Explosion — The 3am Outage That Took Down Monitoring
- Always set a hard cardinality limit in TSDB configuration before ingesting production data.
- Monitor cardinality as a first-class metric — track distinct series count per measurement over time.
- Design label schemas to keep distinct values under 10 per dimension. Pod IDs belong in logs, not metrics.
top and free -m on the TSDB node. Run influx_inspect report-shards (InfluxDB) or tsdb analyze (Prometheus) to identify large shards. Merge or downsample old shards.SHOW SERIES CARDINALITY or SHOW TAG KEYS to identify high-cardinality tags. Remove or relabel immediately. Increase shard retention duration to reduce index rebuild frequency.influxd_inspect dumptsm (InfluxDB) or tsdb retention (TimescaleDB). Ensure compression is enabled for each column. For Prometheus, verify --storage.tsdb.retention.time and --storage.tsdb.max-block-duration are set correctly.--storage.tsdb.min-block-duration=2h to reduce memory for in-memory blocks. For InfluxDB, reduce max-series-per-database in config. Add memory request/limit to pod spec.Key takeaways
Common mistakes to avoid
5 patternsUsing high-cardinality labels like 'user_id' or 'request_id'
Not setting retention policies before production
CREATE RETENTION POLICY "30d" ON "mydb" DURATION 30d REPLICATION 1. For Prometheus: set --storage.tsdb.retention.time=30d.Writing data points out of order (backfilling with random timestamps)
timestamp_precision set.Ignoring compression impact on query speed
Assuming TSDBs handle write spikes gracefully
write-timeout and max-values-per-tag to reject excessive data.Interview Questions on This Topic
Explain how time series data differs from transactional data and how a TSDB exploits those differences.
Frequently Asked Questions
That's Databases in Design. Mark it forged?
8 min read · try the examples if you haven't