Senior 8 min · March 06, 2026

Time Series DB — Cardinality Explosion Crashed Monitoring

Prometheus 30s timeouts, InfluxDB OOM every 4h due to 1.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Time series databases store timestamped measurements, optimized for high write throughput and time-range queries
  • Storage engine decision: LSM-Tree for write-heavy (InfluxDB), B-Tree for mixed workloads (TimescaleDB), or custom append-only (Prometheus)
  • Compression: delta-of-delta timestamps, XOR floating point, bit-packing — cuts storage 90%+
  • Cardinality (number of unique metric series) is the #1 performance killer — keep under 100k per shard
  • Retention policies define data lifecycle; downsampling preserves long-term trends while shedding raw data
  • Production mistake: treating TSDB like a relational DB leads to slow queries and disk bloat within weeks
Plain-English First

Imagine your fitness tracker records your heart rate every second — not just what it is right now, but a permanent, ordered log of every reading since you put it on. A time series database is like that log, but engineered specifically so you can ask questions like 'what was my average heart rate between 2pm and 3pm last Tuesday?' in milliseconds, even if there are billions of readings. It's a regular database's specialised cousin that obsesses over timestamps and makes time-based queries absurdly fast.

Every production system that matters emits a continuous stream of measurements — CPU usage spikes at 3am, a payment gateway latency ticks up by 12ms, a wind turbine's RPM drops before a bearing fails. These aren't events you look up by ID; they're facts anchored in time, and the questions you need to ask are always temporal: trends, anomalies, rolling averages, rate-of-change. General-purpose relational databases weren't built for this workload, and trying to make them do it at scale is one of the fastest ways to watch a perfectly good PostgreSQL instance beg for mercy.

Time series databases (TSDBs) exist because the write pattern, query pattern, and retention lifecycle of timestamped data are fundamentally different from transactional data. You almost never update a past measurement — the past is immutable. Writes are high-throughput and append-only. Queries almost always involve a time range plus aggregation. And data has a natural decay in value: last second's CPU reading matters more than last year's. A TSDB exploits all of these properties at every layer of its architecture — from how bytes land on disk to how queries are planned.

By the end of this article you'll understand exactly how TSDBs store and compress data internally, why cardinality is the silent killer of TSDB performance, how to make a principled choice between InfluxDB, TimescaleDB, and Prometheus for a given system design, and what production mistakes cost teams weeks of firefighting. This is the article you wish existed the first time a monitoring stack fell over at 2am.

What Makes a Database 'Time Series'?

A time series database is any storage system optimized for append-only writes of timestamped measurements. Unlike a traditional row store, a TSDB treats time as the primary sort key — data is always written in time order and queried by time range. This changes everything.

Relational databases organize data by entity (user, order, product). Time series databases organize by time first. That means writes go to the latest time chunk (hot shard), and compactions merge older chunks into read-optimised blocks. You never update a past value — the only operations are insert (new point) and delete (retention).

The query pattern is also different. You ask for 'average CPU over the last hour grouped by 5-minute windows' — not 'CPU value for server X at timestamp Y'. That's a range scan with aggregation, which TSDBs accelerate with time-based partitioning and pre-aggregated rollups.

io/thecodeforge/tsdb/TsdbWriteExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package io.thecodeforge.tsdb;

import java.time.Instant;
import java.util.HashMap;
import java.util.Map;
import java.util.NavigableMap;
import java.util.concurrent.ConcurrentSkipListMap;

/**
 * Minimal TSDB write model — append-only, time-ordered.
 * Real TSDBs persist to WAL then to an LSM tree.
 */
public class TsdbWriteExample {

    // Timeseries data: time -> value
    private final NavigableMap<Long, Double> series;

    public TsdbWriteExample() {
        this.series = new ConcurrentSkipListMap<>();
    }

    public void ingest(String metric, double value, Instant timestamp) {
        // Append-only: no update of existing keys
        series.put(timestamp.toEpochMilli(), value);
    }

    public double queryAverage(long fromEpochMs, long toEpochMs) {
        var sub = series.subMap(fromEpochMs, true, toEpochMs, true).values();
        return sub.stream().mapToDouble(d -> d).average().orElse(0.0);
    }

    public static void main(String[] args) {
        TsdbWriteExample db = new TsdbWriteExample();
        db.ingest("cpu.user", 0.45, Instant.now());
        db.ingest("cpu.user", 0.47, Instant.now().plusSeconds(1));
        double avg = db.queryAverage(Instant.now().minusSeconds(10).toEpochMilli(),
                                      Instant.now().toEpochMilli());
        System.out.println("Average CPU (last 10s): " + avg);
    }
}
Time as Primary Key
  • Data is always ordered by time — no random inserts.
  • Past data is immutable: updates are inserts with a later timestamp.
  • Queries are range scans with aggregation; point lookups are rare.
  • Hot shard vs cold shard: recent data in memory, older data compressed on disk.
Production Insight
Writing out of time order (backfill) is expensive — TSDBs penalize it heavily.
If you must backfill, batch into a single large insert during low load.
Rule: always send data with monotonically increasing timestamps.
Key Takeaway
Time series is an append-only log sorted by time.
Pick your storage engine based on write profile and query needs.
Backfilling with unordered timestamps kills write performance.
Choose Your Storage Paradigm
IfWrite throughput > 1M points/sec, high data ingestion rate
UseUse LSM-based TSDB (InfluxDB, VictoriaMetrics) — optimized for sequential writes and compaction
IfNeed SQL, joins with relational data, moderate write rate
UseUse TimescaleDB (PostgreSQL extension) — B-tree on time, supports full SQL
IfPure monitoring, scrape-based, no SQL needed
UseUse Prometheus — pull model, dimensional labels, integrated alerting

Storage Engine Internals: How TSDBs Organise Data on Disk

The storage engine determines write throughput, compression ratio, and query speed. Two dominant approaches exist: Log-Structured Merge (LSM) trees and custom B-Tree adaptations.

LSM Trees (InfluxDB, VictoriaMetrics, TimescaleDB hybrid)

Writes first land in a Write-Ahead Log (WAL), then are buffered in memory in a sorted data structure (memtable). When the memtable reaches a threshold (e.g., 64MB), it's flushed to disk as an immutable SSTable file. Background compaction merges multiple SSTables into larger ones, discarding tombstones and dead versions. This design gives excellent write throughput because every write is sequential. Reads require checking multiple layers (memory + several SSTables per level). Bloom filters accelerate point lookups, but range scans still need to merge overlapping SSTables.

B-Tree with Time-Ordered Primary Key (TimescaleDB, hybrid)

TimescaleDB uses PostgreSQL's B-Tree where the primary key is (time, optional hash) in a chunked table structure. Writes are still sequential-ish because time is monotonically increasing. It supports full SQL, joins, and relational flexibility. Write throughput is lower than LSM for pure time series, but query performance on small ranges is better due to efficient index lookups. B-Trees also have no write amplification from compaction — but they do have page splits and WAL overhead.

Custom Append-Only (Prometheus, M3DB)

Prometheus stores each time series in a block per 2-hour interval. Data is stored as delta-of-delta compressed chunks. The blocks are read-only once completed. This gives extremely fast read access to recent data but requires careful planning for long-term retention (usually offloaded to Thanos or Cortex).

io/thecodeforge/tsdb/LsmCompactionExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package io.thecodeforge.tsdb;

import java.util.*;

/**
 * Simplified LSM compaction logic — merges sorted SSTables.
 */
public class LsmCompactionExample {

    private final List<List<TimeSeriesPoint>> levels;

    public LsmCompactionExample() {
        this.levels = new ArrayList<>();
    }

    public void addLevel(List<TimeSeriesPoint> points) {
        levels.add(new ArrayList<>(points));
    }

    public List<TimeSeriesPoint> compact() {
        // Merge all levels into one sorted list (like multi-way merge)
        PriorityQueue<TimeSeriesPoint> pq = new PriorityQueue<>(
            (a, b) -> Long.compare(a.timestampMs, b.timestampMs));

        for (List<TimeSeriesPoint> level : levels) {
            pq.addAll(level);
        }

        List<TimeSeriesPoint> result = new ArrayList<>();
        while (!pq.isEmpty()) {
            result.add(pq.poll());
        }
        return result;
    }

    record TimeSeriesPoint(long timestampMs, double value) {}
}
Compaction Amplification
LSM compaction can produce 10–50x write amplification. If your disk is slow (HDD, busy EBS), compaction will throttle ingestion. Always use NVMe SSD for LSM-based TSDBs. Monitor compaction lag via the 'compaction_queue' metric.
Production Insight
B-Tree TSDBs (TimescaleDB) degrade gracefully under memory pressure — LSM TSDBs OOM if compaction falls behind.
If you see 'flush stalls' or 'compaction queue > 100', reduce write batch size or increase compaction threads.
Rule: for high-throughput writes, prefer LSM; for complex queries with joins, prefer B-Tree.
Key Takeaway
LSM trees give high write throughput at the cost of read overhead and compaction.
B-Trees give SQL flexibility and predictable read performance.
Know your access pattern before choosing the storage engine.
Storage Engine Selection
IfWrite throughput primary concern, queries are simple aggregates
UseChoose LSM-based engine (InfluxDB, VictoriaMetrics)
IfNeed SQL, JOINs, and complex time range queries with relational data
UseChoose B-Tree-based engine (TimescaleDB)
IfPrometheus ecosystem already in place, need long term storage
UseUse Prometheus + Thanos (block-based, managed compaction)

Compression Techniques: How TSDBs Fit Petabytes into Gigabytes

Time series data is highly compressible because adjacent points are correlated — CPU utilization doesn't jump from 5% to 95% in a millisecond. TSDBs exploit this with specialized compression schemes.

Timestamp Compression: Delta-of-Delta

Instead of storing full 64-bit timestamps, TSDBs store the difference from the previous timestamp. Most timestamps arrive at a fixed interval (e.g., every 10s). The delta-of-delta is often zero or small. Prometheus uses this: for a batch of 122 points, it stores the first timestamp fully, then the first delta (usually the interval), then delta-of-delta values. When the interval is constant, delta-of-delta is zero and can be encoded in a single bit. This compresses 64-bit timestamps to an average of ~2 bits.

Value Compression: XOR (Gorilla)

Floating point values change slowly. Gorilla compression (originally from Facebook's Gorilla TSDB, now used in Prometheus) XORs consecutive values. If most bits are the same, the result has many leading zeros, which can be encoded with a variable-length scheme. For 64-bit doubles, this achieves ~1.4 bits per point on average for normal metrics.

Columnar Compression

InfluxDB stores each field in separate columns (similar to Parquet). Within a column, values are sorted by time first, so run-length encoding (RLE) and delta compression work well. TimescaleDB's compression uses native PostgreSQL compression types (ZSTD, LZ4) on chunks, achieving 5–20x reduction.

Real-world results: a 7-day retention of 10k series at 10s resolution (~60 million points) occupies ~12GB uncompressed, ~400MB compressed.

io/thecodeforge/tsdb/gorilla_compression.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
"""
Simplified Gorilla XOR compression for floating point values.
Based on the paper: Gorilla: A Fast, Scalable, In-Memory Time Series Database
"""

import struct

def xor_compress(values):
    """Compress list of float64 values using XOR of consecutive values."""
    if not values:
        return b""
    encoded = []
    prev = struct.pack('>d', values[0])
    encoded.append(prev)
    for v in values[1:]:
        curr = struct.pack('>d', v)
        xor_result = int.from_bytes(prev, 'big') ^ int.from_bytes(curr, 'big')
        # Store leading zeros count and the significant bits
        # (simplified - real Gorilla uses variable-length encoding)
        encoded.append(xor_result.to_bytes(8, 'big'))
        prev = curr
    return b"".join(encoded)

def xor_decompress(encoded):
    """Decompress XOR-compressed float64 values."""
    if not encoded:
        return []
    values = []
    prev_bytes = encoded[0:8]
    values.append(struct.unpack('>d', prev_bytes)[0])
    prev_int = int.from_bytes(prev_bytes, 'big')
    for i in range(8, len(encoded), 8):
        xor_int = int.from_bytes(encoded[i:i+8], 'big')
        curr_int = prev_int ^ xor_int
        values.append(struct.unpack('>d', curr_int.to_bytes(8, 'big'))[0])
        prev_int = curr_int
    return values

# Example usage
cpu_values = [0.5, 0.52, 0.51, 0.53, 0.54]
compressed = xor_compress(cpu_values)
# In real Gorilla, this would be ~40 bits for 5 values vs 320 bits raw
print(f"Compressed size: {len(compressed)} bytes (raw: {5*8} bytes)")
Compression Ratio Realities
Delta-of-delta + XOR gives ~10–15x compression on typical server metrics. For high-entropy data (random values), compression drops to 2–3x. Always test with your actual data patterns before capacity planning.
Production Insight
If your metrics have high entropy (e.g., unique random IDs as values), compression fails. Store those as labels instead.
Gorilla compression works best when values change slowly. Spiky metrics (error rates) still compress well because consecutive values are often 0 or 1.
Rule: benchmark compression on your real dataset — don't rely on vendor claims.
Key Takeaway
Timestamp compression uses delta-of-delta (average ~2 bits/timestamp).
Value compression uses XOR of floats (average ~1.4 bits/value).
Test compression on your data — high entropy kills compression ratio.

Retention Policies and Downsampling — Managing Storage Over Time

Time series data has diminishing value over time. The exact CPU reading from 3 months ago isn't useful — but the hourly average is. TSDBs manage this through retention policies and downsampling.

Retention Policies (RP)

InfluxDB's RP defines how long data is kept and how many copies exist. For example, 'autogen' RP with DURATION 7d, REPLICATION 1. Once data exceeds the duration, it's automatically deleted. TimescaleDB uses chunk-based retention: you can drop chunks older than a threshold using drop_chunks() or automate with a background job. Prometheus stores data in blocks per retention period; blocks older than --storage.tsdb.retention.time are deleted.

Downsampling / Continuous Queries

To keep long-term historical data without blowing your storage budget, you create rollups — e.g., store raw data at 10s resolution for 7 days, then 1-minute averages for 30 days, then 1-hour averages for 1 year. InfluxDB calls this Continuous Queries (CQ). TimescaleDB uses time_bucket() and continuous aggregates. Prometheus recording rules can precompute rollups.

Shard Groups and Time-Based Partitioning

Data is written to shard groups that correspond to time ranges (e.g., 7-day shards). Each shard is stored as a set of files (TSM files in InfluxDB, chunks in Prometheus). When a shard is closed (no longer receiving writes), it can be fully compressed and moved to cold storage or even archived to object storage (e.g., S3) using tools like Thanos or InfluxDB Cloud's tiered storage.

io/thecodeforge/tsdb/continuous_aggregate.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- TimescaleDB: Create a continuous aggregate downsampling 1-minute data to 1-hour averages
CREATE MATERIALIZED VIEW cpu_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    host,
    AVG(usage) AS avg_usage,
    MAX(usage) AS max_usage
FROM cpu_usage
GROUP BY bucket, host;

-- InfluxDB Continuous Query equivalent
CREATE CONTINUOUS QUERY "cpu_hourly" ON "mydb"
BEGIN
  SELECT mean(value) AS mean_value
  INTO cpu_1h
  FROM cpu_raw
  GROUP BY time(1h), *
END;

-- Prometheus recording rule (rules.yml)
# groups:
# - name: downsampling.rules
#   interval: 1h
#   rules:
#   - record: node_cpu_hourly:avg
#     expr: avg_over_time(node_cpu_seconds_total[1h])
Data Temperature Pyramid
  • Hot tier: raw data in memory (last 6h) — fastest queries, expensive storage.
  • Warm tier: compressed data on SSD (last 7 days) — good performance, moderate cost.
  • Cold tier: downsampled rollups on HDD/object storage (months/years) — cheap, slower queries.
Production Insight
Downsampling too aggressively loses precision for anomaly detection. Keep raw data at least 24h for incident forensics.
Setting retention too short leads to 'data holes' when debugging historic issues. Always ask: 'What data would I need for this time next year?'
Rule: define retention tiers before production — raw 7d, 1m avg 60d, 1h avg 365d.
Key Takeaway
Use retention policies to automate data deletion.
Downsample raw data into rollups for long-term storage.
Plan tiered storage: hot (SSD, raw), warm (SSD, compressed), cold (object, rollups).
Choose Downsampling Strategy
IfNeed sub-10 second precision for recent data
UseKeep raw at 1s resolution for 48h, then downsample to 1m
IfCompliance requires 1-year retention but low query frequency on old data
UseDownsample to 1h averages after 30 days, store in cold object storage
IfOnly need aggregate trends (no point-in-time accuracy beyond 1 week)
UseDownsample immediately — use 5m averages from ingestion

Cardinality: The Silent Performance Killer

Cardinality is the number of unique time series in your TSDB. Each combination of metric name + label key/value pair creates a distinct series. For Prometheus: http_requests_total{method="GET", endpoint="/api", status="200"} is one series. Add a label user_uuid with 10,000 different values, and you get 10,000 series — for just this one metric.

Why cardinality matters

  • Index memory: TSDBs keep an in-memory index of all series. High cardinality consumes RAM and slows down writes because every new label combination must be added to the index (usually requires a lock or CAS operation).
  • Query performance: Queries that scan many series (e.g., 'sum of all requests') must iterate over thousands of series. Even with aggregation, the planner needs to evaluate each series.
  • Compaction overhead: More series means more SSTable files (in LSM) or more blocks (Prometheus). Compaction merges many small files, increasing CPU and I/O.

Safe cardinality thresholds

  • Prometheus: keep total series under 500k per server (for 8GB RAM). Use --storage.tsdb.max-series to enforce a hard limit.
  • InfluxDB: keep series per measurement under 100k. Use max-series-per-database config.
  • TimescaleDB: less sensitive because indexing is disk-based (B-Tree), but high cardinality still hurts query performance. Keep distinct tag values under 1,000 per dimension.

How cardinality explosions happen

Common culprits: Including request IDs, timestamps, or random values as labels. For example, adding a 'container_id' label in a Kubernetes cluster with 500 pods creates 500 series per metric. Add 'pod_name' with unique names and it multiplies. Multiply by every redeployment (new container IDs), and you get unbounded growth.

io/thecodeforge/tsdb/cardinality_check.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- InfluxDB: Check cardinality per measurement
SHOW SERIES CARDINALITY ON "mydb";

-- InfluxDB: Find the measurement with highest cardinality
SELECT *
FROM (
  SELECT measurement, count(*) as series_count
  FROM (SHOW SERIES)
  GROUP BY measurement
) ORDER BY series_count DESC;

-- TimescaleDB: Count distinct series (using hyperloglog for efficiency)
SELECT hyperloglog_distinct(tags::text) AS estimated_cardinality
FROM conditions
WHERE time > now() - interval '1 hour';

-- Prometheus: Via API
curl http://localhost:9090/api/v1/status/tsdb | jq '.data.seriesCountByMetricName | sort_by(.value) | reverse[0:5]'
Cardinality Explosion Prevention
Never use high-cardinality values (user IDs, request IDs, timestamps) as labels. Use them as the value of the metric if needed, or log them to a separate indexing system (Elasticsearch). Set a hard limit on cardinality in your TSDB config before going to production.
Production Insight
Cardinality explosions often happen during load tests or when onboarding new services without label review.
The symptom is subtle: write throughput drops slowly over hours, not a sudden crash. By the time alert fires, the index is already bloated.
Rule: set cardinality alerts before production — trigger at 80% of the hard limit and page the team.
Key Takeaway
Cardinality is the product of metric * label values.
Keep per-measurement cardinality under 100k series.
Design labels like database indexes — fewer, more selective.
Managing Cardinality Design
IfMetric value depends on a unique identifier (e.g., user ID, session ID)
UseDo not use as label. Either make it a metric value or send to a separate logging system.
IfLabel has high churn (e.g., container IDs change every deploy)
UseUse a rolling hash or reduce to a few categories (e.g., 'version' not 'commit_sha').
IfYou need to filter by user ID but cardinality is high
UseStore user ID in the metric value or use a separate Prometheus per tenant.

Choosing the Right TSDB: InfluxDB, TimescaleDB, or Prometheus

You don't start with 'which TSDB is best?' You start with 'what's my workload?' The answer hinges on three things: write throughput, query complexity, and ecosystem.

InfluxDB - Best for: High-volume IoT metrics, application monitoring, sensor data. - Engine: LSM-based (TSM over RocksDB). Strong compression, high write throughput (millions of points/sec on a single node). - Query: InfluxQL or Flux (deprecating Flux in 3.x). No SQL, limited joins. - Retention: Native RP and CQ. Downsampling built-in. - Scalability: Single-node OSS up to ~1M series. Enterprise/Cloud for clustering. No built-in federation.

TimescaleDB - Best for: Hybrid workloads — time series that occasionally need SQL joins with relational data (e.g., financial trades, sensor + metadata). - Engine: PostgreSQL with automatic partitioning (hypertable). Full SQL support, JSON, JOINs, window functions. - Query: PostgreSQL SQL. Rich ecosystem of tools. - Retention: Via chunk dropping or timescaledb_expire chunk job. - Scalability: Single-node with streaming replication for HA. Distributed version (timescaledb scale) since 2.13.

Prometheus - Best for: Cloud-native monitoring, Kubernetes, alerting tied to metrics. - Engine: Custom block-based storage. Pull model (or push via Pushgateway). - Query: PromQL — powerful, but different from SQL. Excellent for alerting rules. - Retention: Block-based with configurable retention. Long term via Thanos/Cortex. - Scalability: Single-server limited to ~1M series. Use Thanos for aggregation across clusters.

When to avoid each - Avoid InfluxDB if you need SQL joins or complex relational queries. - Avoid TimescaleDB if your write throughput exceeds 500k points/sec on a single node (use LSM alternatives). - Avoid Prometheus if you need to push metrics from many sources without a central coordinator (use Pushgateway carefully).

Ecosystem Fit
If your org already uses Grafana and Prometheus for alerting, adding another TSDB means maintaining two monitoring stacks. Consider whether the new workload fits Prometheus + Thanos before introducing a second system.
Production Insight
Switching TSDBs mid-project is painful — migration requires re-architecting queries and retention policies.
Do a proof-of-concept with real data volume and query patterns before committing.
Rule: choose one primary TSDB per org and standardize; exceptions only for workloads that fundamentally differ in access pattern.
Key Takeaway
Match TSDB to workload: write throughput, query complexity, and ecosystem.
Prometheus for monitoring, TimescaleDB for SQL, InfluxDB for IoT.
Start with one primary TSDB; avoid multi-TSDB complexity unless necessary.
TSDB Selection Matrix
IfPrimary workload: infrastructure monitoring, Kubernetes, no SQL needed
UseChoose Prometheus (+ Thanos for long-term)
IfNeed SQL, time series + relational joins, moderate throughput
UseChoose TimescaleDB
IfExtremely high write throughput (IoT, finance tick data)
UseChoose InfluxDB or VictoriaMetrics
IfAlready invested in PostgreSQL ecosystem, want minimal new tech
UseChoose TimescaleDB — standard SQL skills transfer
● Production incidentPOST-MORTEMseverity: high

Cardinality Explosion — The 3am Outage That Took Down Monitoring

Symptom
Prometheus queries started timing out after 30s. InfluxDB heap usage spiked to 32GB, then OOM-killed the process every 4 hours. Write throughput dropped to <100 points/sec.
Assumption
The team assumed unlimited cardinality was safe because each pod had unique container IDs as labels. They thought the TSDB could 'handle it' with horizontal scaling.
Root cause
Cardinality is the product of distinct values across all tag sets. 5000 pods × 20 machines × 10 applications × each with 3 different container states = 1.2M unique time series. Each series consumes ~48 bytes in the index, totalling 57MB for index alone, plus memory overhead for caching. InfluxDB stores the index in memory by default — once it exceeds heap, GC thrashing begins.
Fix
Reduced cardinality by removing high-cardinality labels (container IDs) and moving them to a separate log system. Implemented cardinality limits per measurement at the database level (max 100k series per measurement). Added alerting on cardinality growth rates. Restart with 8GB heap and pre-created shard groups for new retention periods.
Key lesson
  • Always set a hard cardinality limit in TSDB configuration before ingesting production data.
  • Monitor cardinality as a first-class metric — track distinct series count per measurement over time.
  • Design label schemas to keep distinct values under 10 per dimension. Pod IDs belong in logs, not metrics.
Production debug guideDiagnose the five most common TSDB failures in production5 entries
Symptom · 01
Query times suddenly spike from 50ms to 5s, but data ingestion appears normal
Fix
Check memory pressure with top and free -m on the TSDB node. Run influx_inspect report-shards (InfluxDB) or tsdb analyze (Prometheus) to identify large shards. Merge or downsample old shards.
Symptom · 02
Write throughput drops to <10% of normal after adding new labels
Fix
Cardinality explosion alert. Run SHOW SERIES CARDINALITY or SHOW TAG KEYS to identify high-cardinality tags. Remove or relabel immediately. Increase shard retention duration to reduce index rebuild frequency.
Symptom · 03
Disk usage grows faster than expected, even with short retention
Fix
Check compression ratio with influxd_inspect dumptsm (InfluxDB) or tsdb retention (TimescaleDB). Ensure compression is enabled for each column. For Prometheus, verify --storage.tsdb.retention.time and --storage.tsdb.max-block-duration are set correctly.
Symptom · 04
Containers in Kubernetes monitoring stack crash with OOMKilled
Fix
Check vector or Prometheus memory limits. Set --storage.tsdb.min-block-duration=2h to reduce memory for in-memory blocks. For InfluxDB, reduce max-series-per-database in config. Add memory request/limit to pod spec.
Symptom · 05
Queries returning stale data or missing records after a failover
Fix
Check replication factor and consistency level. In TimescaleDB, verify all chunks are replicated across nodes. For InfluxDB OSS, there is no HA — use InfluxDB Enterprise or a proxy layer. Ensure write acknowledgements are set to quorum.
★ TSDB Quick Debug Cheat SheetFive production scenarios and the exact commands to run
High cardinality detected
Immediate action
List top cardinality series
Commands
influx -database 'mydb' -execute 'SHOW SERIES CARDINALITY'
curl http://localhost:9090/api/v1/status/tsdb | jq '.data.labelValuesCount'
Fix now
Drop high-cardinality measurements: DROP MEASUREMENT "high_card_metric"
Query slow on large time range+
Immediate action
Check if query uses full scan
Commands
EXPLAIN ANALYZE SELECT * FROM metrics WHERE time > now() - 7d
influx -execute 'EXPLAIN SELECT mean(value) FROM cpu WHERE time > now() - 1d'
Fix now
Add a continuous query to downsample hourly
Disk filling fast+
Immediate action
Identify largest shards
Commands
influx_inspect report-shards -database mydb
du -sh /var/lib/influxdb/data/*
Fix now
Extend retention policy: ALTER RETENTION POLICY "autogen" ON "mydb" DURATION 30d
Write throughput drop+
Immediate action
Check connections and series creation rate
Commands
influx -execute 'SHOW STATS' | grep 'writePointsOk'
curl http://localhost:9090/api/v1/query?query=rate(prometheus_tsdb_wal_writes[5m])
Fix now
Batch writes >=100 points per request, use gzip compression
OOM crash on TSDB pod+
Immediate action
Check memory limit and heap size
Commands
cat /proc/meminfo | grep MemTotal
journalctl -u influxdb | grep -i 'OutOfMemory'
Fix now
Set INFLUXD_DATA_MAX_SERIES_PER_DATABASE=100000 and increase memory limit to 8GB
Storage Engine Comparison
PropertyLSM (InfluxDB, VictoriaMetrics)B-Tree (TimescaleDB)Block-based (Prometheus)
Write Amplification10-50x (compaction)Low (WAL + page splits)None (immutable blocks)
Read AmplificationHigh (multiple SSTables per query)Low (single B-Tree lookup)Moderate (merge multiple chunks)
Compression EfficiencyExcellent (Gorilla + columnar)Good (TOAST + ZSTD)Excellent (delta-of-delta + XOR)
Row UpdateInsert new version (append)In-place update (B-Tree)Append only (no updates)
Concurrent WritesExcellent (sequential WAL)Good (B-Tree locking)Good (append to current block)
Query FlexibilityLimited (no JOINs)Full SQLPromQL (highly optimized)

Key takeaways

1
Time series databases are optimized for append-only writes and time-range queries, not updates or random access.
2
Storage engine choice (LSM vs B-Tree vs block-based) determines write vs read performance trade-offs.
3
Compression (delta-of-delta + XOR) can reduce storage by 90% but requires low-entropy, slowly changing values.
4
Cardinality is the number of unique metric series
keep it under 100k per measurement or risk OOM.
5
Retention policies and downsampling are mandatory; plan tiers (raw, hourly, daily) before production.
6
Choose the right TSDB by workload
InfluxDB for IoT throughput, TimescaleDB for SQL joins, Prometheus for monitoring.

Common mistakes to avoid

5 patterns
×

Using high-cardinality labels like 'user_id' or 'request_id'

Symptom
Index memory grows unbounded, write throughput drops by 90% within hours. Eventually the TSDB process OOMs.
Fix
Remove high-cardinality labels from metric schema. If you must track per-user metrics, either use a separate TSDB tenant per user or store the user ID as a metric value (not a label).
×

Not setting retention policies before production

Symptom
Disk fills completely after a few days. Queries on old data return stale cached results because compaction backlog grows unbounded.
Fix
Configure retention policy (DURATION + SHARD DURATION) before any writes. For InfluxDB: CREATE RETENTION POLICY "30d" ON "mydb" DURATION 30d REPLICATION 1. For Prometheus: set --storage.tsdb.retention.time=30d.
×

Writing data points out of order (backfilling with random timestamps)

Symptom
WAL grows large, compaction stalls, write latency spikes from 1ms to 1s. Disk space fills quickly due to unmerged SSTables.
Fix
Always send data in timestamp order. If backfilling is unavoidable, batch writes in a single large POST and ensure timestamps are monotonically increasing within each shard interval. Use InfluxDB's backfill option with timestamp_precision set.
×

Ignoring compression impact on query speed

Symptom
Queries on compressed data are 50x slower than expected. The database decompresses on the fly, adding CPU overhead per query.
Fix
Use pre-aggregated rollups for common query patterns. For Prometheus, use recording rules to precompute slow queries. For InfluxDB, use continuous queries. Monitor CPU at the TSDB node level during peak query hours.
×

Assuming TSDBs handle write spikes gracefully

Symptom
During a deployment event (all containers restart), write rate spikes 100x. The TSDB's WAL fills disk or compaction queue backs up, causing eventual OOM.
Fix
Implement client-side rate limiting or buffering. Use a message queue (Kafka) between application and TSDB to absorb spikes. In InfluxDB, set write-timeout and max-values-per-tag to reject excessive data.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain how time series data differs from transactional data and how a T...
Q02SENIOR
What is cardinality in a TSDB and why is it critical for performance?
Q03SENIOR
Compare InfluxDB and TimescaleDB for a financial trading system that nee...
Q04SENIOR
How does Prometheus handle long-term storage and what are its limitation...
Q05SENIOR
Explain how Gorilla (delta-of-delta + XOR) compression works and why it'...
Q01 of 05SENIOR

Explain how time series data differs from transactional data and how a TSDB exploits those differences.

ANSWER
Transactional data is frequently updated, deleted, and requires random access (e.g., user profile). Time series data is append-only, rarely updated, and always queried by time range. TSDBs exploit this by: (1) using append-only storage engines (LSM) that avoid random writes, (2) compressing timestamps and values using delta-of-delta and XOR, (3) partitioning data by time (shard groups) so old data can be dropped or downsampled, and (4) using index structures optimized for range scans and aggregates rather than point lookups.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is a time series database in simple terms?
02
Can I use PostgreSQL or MySQL for time series data?
03
What is the difference between InfluxDB and Prometheus?
04
How do I estimate storage requirements for a time series database?
05
Why does my TSDB crash with OOM when I add more metrics?
🔥

That's Databases in Design. Mark it forged?

8 min read · try the examples if you haven't

Previous
Data Lake vs Data Warehouse
5 / 5 · Databases in Design
Next
OAuth 2.0 and OpenID Connect