Prometheus OOMKill: High-Cardinality Labels in Kubernetes
user_id label explosion: 250k series/hour, 24GB RAM, OOMKill on 16GB pod.
- Pull model: Prometheus scrapes /metrics endpoints on targets at configured intervals.
- Service Discovery: Queries Kubernetes API to find pods, services, nodes dynamically — no static IPs.
- Four metric types: Counter (only goes up), Gauge (up/down), Histogram (bucketed observations), Summary (client-side quantiles).
- Recording rules: Pre-compute expensive PromQL into cheap time series.
- Alerting: Prometheus evaluates rules, Alertmanager routes to PagerDuty/Slack.
- Pull model means Prometheus must reach every target. NetworkPolicy misconfigs silently break scraping.
- High-cardinality labels (user_id, request_id) will OOM Prometheus.
- Using Summary instead of Histogram in multi-replica deployments. Summaries cannot be aggregated across instances.
Imagine your Kubernetes cluster is a busy hospital. Dozens of doctors (pods), nurses (services), and wards (namespaces) are running simultaneously. Prometheus is like the hospital's central monitoring board — it walks around every few seconds, checks each room's vitals (CPU, memory, request rates), writes them down in a giant logbook, and sounds an alarm if a patient's heart rate spikes. You don't wait for something to go wrong — the board tells you before it becomes a crisis.
Running Kubernetes in production without monitoring is like flying a commercial aircraft with the instrument panel blacked out. Everything might feel fine until it catastrophically isn't. Prometheus is used by over 84% of Kubernetes production environments — not because it's the easiest tool, but because it's the most powerful pull-based metrics system that was purpose-built for dynamic, containerized infrastructure.
The real problem Prometheus solves is the ephemeral nature of Kubernetes workloads. Traditional monitoring tools expect your target IPs to stay fixed. In Kubernetes, a pod's IP changes every restart. Prometheus solves this with Kubernetes-native service discovery — it queries the Kubernetes API server directly to find what's alive right now, not what was alive when you wrote the config.
This is not a getting-started guide. It covers scrape configurations with relabeling, custom application metrics using client libraries, recording rules to avoid query-time explosions, Alertmanager integration, and the five most expensive mistakes teams make in production.
Prometheus Stack (Operator) Component Visual
The Prometheus Operator for Kubernetes introduces a set of Custom Resource Definitions (CRDs) that declaratively define the Prometheus monitoring stack. Understanding how these components fit together is essential before diving into scrape configuration.
- Prometheus: Defines a Prometheus statefulset, including retention, storage, and resource limits. The operator manages the Prometheus pods, config reloading, and target reconciliation.
- Alertmanager: Defines an Alertmanager cluster with config secret, routing, and receivers. The operator creates the Alertmanager pods and manages the config.
- ServiceMonitor: Declares how to scrape a Kubernetes service. The operator converts ServiceMonitor selectors into Prometheus scrape targets. It is the most common way to configure scraping in operator-managed setups.
- PodMonitor: Like ServiceMonitor but scrapes individual pods directly, without an intermediate service. Useful for scraping metrics from pods that are not behind a service.
- PrometheusRule: Contains recording and alerting rules. The operator loads them into Prometheus.
Additionally, the ScrapeConfig CRD (introduced in Prometheus Operator v0.65+) provides a lower-level way to define scrape targets with full Prometheus scrape_config semantics, bypassing the semantic filters of ServiceMonitor/PodMonitor.
A typical production deployment creates a Prometheus CR, an Alertmanager CR, and one or more ServiceMonitor or ScrapeConfig CRs, plus PrometheusRule CRs for alerts. The operator watches these CRDs and reconciles the state of Prometheus and Alertmanager instances.
The diagram below shows the relationships: the Operator watches CRDs, creates Prometheus/Alertmanager pods, and translates ServiceMonitors into scrape configs.
- Prometheus CR specifies selectors for ServiceMonitors and PrometheusRules.
- Each ServiceMonitor must have labels matching
serviceMonitorSelector. - Each PrometheusRule must have labels matching
ruleSelector. - Use
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=falsein Helm to avoid default label restrictions.
serviceMonitorSelectorNilUsesHelmValues: true. This means the Operator only picks up ServiceMonitors with the Helm release label. If you create a ServiceMonitor manually using kubectl apply, it won't be scraped unless you also add the label release: <name>. This catches many teams off guard. Set serviceMonitorSelectorNilUsesHelmValues: false to allow unlabeled ServiceMonitors, or always include the release label.How Prometheus Service Discovery Works Inside Kubernetes
Prometheus uses a pull model — it reaches out to targets and scrapes metrics endpoints, typically on path /metrics, at a configured interval. In a static world you'd list IPs. In Kubernetes, Prometheus uses kubernetes_sd_configs to query the Kubernetes API and discover pods, services, endpoints, nodes, and ingresses dynamically.
When Prometheus starts, it authenticates to the API server using a ServiceAccount token mounted in its pod. It then watches specific resource types. For the endpoints role, Prometheus discovers every Endpoints object across the cluster. For each endpoint address it finds, it creates a scrape target. The magic happens during relabeling — a pipeline that runs before the scrape and lets you filter, rename, and attach labels using values pulled directly from Kubernetes metadata (pod annotations, namespace labels, service names).
The annotation `prometheus.io/scrape: 'true' is a community convention that Prometheus relabeling configs check. If the annotation exists and is true, the pod is scraped. This means enabling monitoring for a new application is as simple as adding three lines to its pod spec — no Prometheus config reload needed. Prometheus reconciles new targets automatically every scrape_interval`.
Understanding the target lifecycle is critical for production. Targets move through states: up, down, and unknown. A target goes unknown when Prometheus can't reach the endpoint at all (network issue or pod not started). It goes down when the HTTP scrape returns a non-200 status or times out. Staleness markers are injected after a target disappears — this prevents old time series from polluting range queries.
- Multi-container pods expose multiple ports. Prometheus picks one — not always the right one.
- The
__address__label determines where Prometheus connects. Relabeling rewrites it. - Without
prometheus.io/port, Prometheus uses the first container port in the pod spec. - Sidecar containers (Istio, Envoy) often expose ports that are not your metrics port.
down or unknown with no useful error. Always include Prometheus's namespace in your NetworkPolicy ingress rules. Use kubectl exec from the Prometheus pod to test connectivity to the target before blaming the scrape config.prometheus.io/scrape, prometheus.io/port, and prometheus.io/path explicitly.ServiceMonitor vs ScrapeConfig Decision Guide
When using the Prometheus Operator, you have two primary ways to define scrape targets: ServiceMonitor (and its cousin PodMonitor) and the newer ScrapeConfig CRD (available since Prometheus Operator v0.65). Understanding which to use is critical for production architecture.
ServiceMonitor is the original CRD. It is designed to scrape a Kubernetes Service. You specify a service selector (by labels), and the Operator automatically discovers all endpoints behind that service. It adds semantic constraints: the service must expose one or more ports, and you can optionally specify a path, interval, and metric relabeling. ServiceMonitor is high-level: the Operator handles converting service endpoints to individual pod IP addresses.
ScrapeConfig is a lower-level CRD. It directly mirrors the Prometheus scrape_config block. You define kubernetes_sd_configs, relabel_configs, metric_relabel_configs, etc., exactly as you would in a raw Prometheus YAML file. There are no implicit service-based semantics. This gives you full control but requires more expertise.
The decision tree below helps pick the right one.
- You want to scrape a standard Kubernetes Service (preferred for 90% of cases).
- You need simplicity: just select the service by label and define a port.
- You want the Operator to dynamically follow pods as they scale or roll.
- You are monitoring a typical application pod behind a Service.
- You need full control over relabeling, including non-standard discovery (e.g., scraping a non-Kubernetes endpoint, or a static target).
- You need to scrape a target that is not behind a Service (e.g., a DaemonSet pod that you want to scrape by node).
- You want to use the full power of
kubernetes_sd_configsroles beyond endpoints (e.g., node, ingress). - You are migrating from a raw Prometheus configuration and want to keep the same scrape config syntax.
A common production pattern: use ServiceMonitor for all user-facing services (HTTP APIs, gRPC), and ScrapeConfig for infrastructure components (node-exporter, kube-state-metrics, or custom exporters that expose metrics on non-standard ports).
- ServiceMonitor: high-level, service-oriented, Operator abstracts pod IPs.
- ScrapeConfig: low-level, full Prometheus config syntax, full control over discovery.
- Use ServiceMonitor for typical HTTP services on standard ports.
- Use ScrapeConfig for node-level scraping, non-Kubernetes targets, or complex relabeling.
- ScrapeConfig supports all
kubernetes_sd_configroles: pod, service, endpoints, node, ingress.
matchLabels with a specific app label, not broad selectors that could match system services. ScrapeConfig on the other hand lets you define exactly which kubernetes_sd_config role to use and filter with relabel_configs. A common pattern is to use ServiceMonitor for business applications and ScrapeConfig for infrastructure components like node-exporter, cAdvisor, or custom DaemonSet-based exporters.Exposing Custom Application Metrics with the Prometheus Client Libraries
Kubernetes infrastructure metrics (CPU, memory, network) come from kube-state-metrics and node-exporter. But the metrics that make or break your SLOs are application-level: request latency, error rates, queue depth, cache hit ratio. These come from instrumenting your own code.
Prometheus has four core metric types you need to understand at the semantic level, not just the API level:
Counter — a value that only goes up (resets to zero on restart). Use it for total requests, total errors, total bytes sent. Never use a counter for something that can decrease. PromQL's rate() and increase() functions unwrap counters properly, handling resets.
Gauge — a value that can go up or down. Use it for current queue depth, active connections, temperature, memory usage. Don't use on a gauge — it's meaningless.rate()
Histogram — pre-aggregated bucketed observations. Use it for latency and request size. It exposes three time series: _bucket, _sum, and _count. The bucket boundaries you choose at instrumentation time are permanent — you can't change them without restarting the process.
Summary — client-side computed quantiles. Use it only when you need accurate quantiles and can't aggregate across instances (summaries can't be aggregated in PromQL). In Kubernetes with multiple replicas, histograms are almost always the right choice over summaries.
Production-Grade Recording Rules and Alerting That Won't Page You at 3am
Raw PromQL queries against high-cardinality data are expensive. A query like rate(http_requests_total[5m]) across 200 pods runs every time a dashboard loads. In large clusters, this causes Prometheus to churn through millions of samples per query, leading to query timeouts and the dreaded 'query timed out in expression evaluation' error.
Recording rules solve this by pre-computing expensive expressions and storing the result as a new time series. Prometheus evaluates recording rules on its evaluation interval (typically 1m), writes the result into its TSDB, and future queries read that cheap pre-computed series instead of re-scanning the raw data.
Naming matters. The Prometheus community convention for recording rule names is level:metric:operations. For example job:http_requests_total:rate5m means: aggregated at the job level, derived from http_requests_total, computed as a 5-minute rate. Sticking to this convention makes rules self-documenting and searchable.
Alerts in Prometheus are defined in the same YAML format as recording rules. The critical production insight is that alerts should express SLO burn rates, not raw thresholds. An alert that fires when error rate > 1% will fire constantly during minor blips. An alert based on a multi-window burn rate (Google's SRE model) only fires when you're burning through your error budget fast enough to exhaust it within a prediction window — dramatically reducing noise.
Alertmanager: Routing, Silencing, and Deduplication
Prometheus evaluates alert rules and sends firing alerts to Alertmanager. Alertmanager is a separate component responsible for deduplicating, grouping, routing, and silencing alerts before sending them to notification channels (PagerDuty, Slack, email, Opsgenie).
Alertmanager's routing tree is a hierarchical matching system. An alert's labels are matched against the route configuration. The first matching route determines the receiver. This means your alert labels (severity, team, service) must be carefully designed to match your routing tree.
- Deduplication: Same alert fingerprint = one notification.
- Grouping:
group_bydetermines which alerts are batched together. - Inhibition: Higher-severity alerts suppress lower-severity alerts for the same context.
- Silences: Temporary muting of alerts during maintenance windows.
- Routing: Label matching determines which receiver (PagerDuty, Slack, email) gets the alert.
group_by field in Alertmanager controls notification batching. If you group by alertname only, all pods with the same alert are grouped into one notification — good for reducing noise. If you group by alertname, pod, each pod gets its own notification — bad during a cluster-wide outage where 200 pods trigger the same alert. The production default should be group_by: ['alertname', 'namespace'] to batch by service and namespace. Use group_wait: 30s to allow grouping before sending.PromQL Common Query Cheat Sheet
PromQL is the query language for Prometheus. Whether you're building Grafana dashboards, writing alerting rules, or debugging a production issue, having a mental library of common PromQL patterns saves hours. Below is a cheat sheet of the most useful queries for Kubernetes monitoring.
Infrastructure Metrics - CPU usage per pod: rate(container_cpu_usage_seconds_total{container!=\"POD\", image!=\"\"}[5m]) - Memory usage per pod: container_memory_working_set_bytes{container!=\"POD\", image!=\"\"} - Network receive bytes per pod: rate(container_network_receive_bytes_total[5m]) - Disk reads per pod: rate(container_fs_reads_bytes_total[5m])
Application Metrics (assuming custom metrics like payment_service_http_requests_total) - Request rate per second: rate(payment_service_http_requests_total[5m]) - Error rate per second: rate(payment_service_http_requests_total{status_code=~\"5..\"}[5m]) - Error ratio (percentage): sum(rate(payment_service_http_requests_total{status_code=~\"5..\"}[5m])) / sum(rate(payment_service_http_requests_total[5m])) - p99 latency: histogram_quantile(0.99, sum(rate(payment_service_http_request_duration_seconds_bucket[5m])) by (le, endpoint)) - Average latency: rate(payment_service_http_request_duration_seconds_sum[5m]) / rate(payment_service_http_request_duration_seconds_count[5m])
Prometheus Self-Monitoring - Active time series: prometheus_tsdb_head_series - Memory usage: process_resident_memory_bytes - Scrape duration per job: scrape_duration_seconds - Targets up per job: up (returns 1 if target is up, 0 if down)
Alerting Patterns - Pager-worthy: job_endpoint:payment_service_error_ratio:rate5m > (14.4 * 0.001) (14.4 x 0.1% = 1.44% error rate over 5m, burning through monthly SLO in hours) - No data: absent(prometheus_tsdb_head_series) — fires when Prometheus itself is down - Pod restart detection: changes(process_start_time_seconds[1h]) > 0
Cardinality Detection - Top 10 metric names by series count: topk(10, count by (__name__)({__name__=~\".+\"})) - Series count for a specific metric: count(payment_service_http_requests_total)", "code": { "language": "promql", "filename": "promql-cheat-sheet.promql", "code": "# ---- Infrastructure Metrics ----
# CPU usage rate (5m window) per pod rate(container_cpu_usage_seconds_total{container!=\"POD\", image!=\"\"}[5m])
# Memory working set per pod (Gauge, no rate) container_memory_working_set_bytes{container!=\"POD\", image!=\"\"}
# Network bytes received per second per pod rate(container_network_receive_bytes_total[5m])
# ---- Application Request Metrics ----
# Total request rate sum(rate(payment_service_http_requests_total[5m]))
# Error ratio (percentage of 5xx) sum(rate(payment_service_http_requests_total{status_code=~\"5..\"}[5m])) / sum(rate(payment_service_http_requests_total[5m]))
# p99 latency (uses histogram_quantile) histogram_quantile(0.99, sum(rate(payment_service_http_request_duration_seconds_bucket[5m])) by (le, endpoint))
# ---- Prometheus Self-Monitoring ----
# Active time series count prometheus_tsdb_head_series
# Prometheus process memory process_resident_memory_bytes
# ---- Cardinality Analysis ----
# Top 10 metric names by number of time series topk(10, count by (__name__)({__name__=~\".+\"}))
# ---- Alerting Patterns ----
# High error burn rate (SLO-based) job_endpoint:payment_service_error_ratio:rate5m > (14.4 * 0.001)
# Detect pod restart in last hour (counter reset) changes(process_start_time_seconds[1h]) > 0
# Absence of data (Prometheus itself down) absent(prometheus_tsdb_head_series)", "output": "These PromQL queries can be executed directly in the Prometheus UI Execute page, in Grafana panels, or in alert rules." }, "callout": { "type": "tip", "title": "Always Use Recording Rules for Repeated Queries", "text": "The error ratio query above is expensive: it scans all error series and total series, then divides. If you use this query in five dashboards and one alert, you're executing it six times every evaluation cycle. Create a recording rule job_endpoint:payment_service_error_ratio:rate5m and query that instead.", "hook": "Every PromQL query you write should be a candidate for a recording rule if it appears in more than one place.", "bullets": [ "Recording rules pre-compute expensive queries into cheap time series.", "Use rate() for counters, not raw values (counter resets break averages).", "Use histogram_quantile() with sum by (le, ...) for aggregated percentiles.", "Avoid * selectors in production — always filter by at least one label.", "Check query performance with the Prometheus UI's query analysis (explain button)." ] }, "production_insight": "The most common PromQL mistake is using rate() on a gauge — it produces meaningless results because gauges can go down. Another frequent error is forgetting to sum by the correct labels when using histogram_quantile(): you must include le (bucket upper bound) in the by clause, otherwise Prometheus returns an error. When debugging high memory, the topk(10, count by (__name__)({__name__=~\".+\"}))` query quickly identifies which metric name has the most time series — often the culprit is a high-cardinality label on a single metric.", "key_takeaway": "Master these core PromQL patterns: rate (counters), gauge (raw values), histogram_quantile (latency percentiles), topk (cardinality detection). Use recording rules to optimize expensive queries. Always filter by at least one label to avoid scanning the entire TSDB." }, { "heading": "Prometheus Storage: TSDB Internals, Retention, and Thanos/Cortex for Long-Term", "content": "Prometheus stores metrics in its own time-series database (TSDB). Understanding TSDB internals is critical for capacity planning, retention tuning, and deciding when to add long-term storage.
TSDB stores data in blocks. Each block covers a 2-hour time range and contains a chunks directory (compressed metric samples) and an index. The head block is the in-memory write-ahead log (WAL) that receives all new samples. Every 2 hours, the head block is compacted into a persistent block and flushed to disk. Old blocks are compacted into larger blocks (e.g., 2h blocks into 1-day blocks) to reduce the number of files.", "code": { "language": "yaml", "filename": "prometheus-retention-config.yaml", "code": "# Prometheus StatefulSet with retention and storage configuration. # For the kube-prometheus-stack Helm chart, these go in prometheus.prometheusSpec. apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: kube-prometheus namespace: monitoring spec: # Retention: how long to keep data locally. # 15d is typical. Longer retention = more disk and memory. retention: 15d
# Retention by size: delete oldest blocks when storage exceeds this. # Use this as a safety net alongside time-based retention. retentionSize: 50GB
# Storage: PVC for persistent TSDB blocks. storage: volumeClaimTemplate: spec: storageClassName: fast-ssd accessModes: - ReadWriteOnce resources: requests: storage: 100Gi
# Resources: Prometheus is memory-hungry. # Rule of thumb: ~16KB per active time series. # 1M series = 16GB RAM. Plan accordingly. resources: requests: cpu: '1' memory: 16Gi limits: cpu: '4' memory: 32Gi
# External labels: applied to all metrics when using Thanos/Cortex. # Identifies which Prometheus instance scraped the data. externalLabels: cluster: production-us-east-1 environment: production
# Thanos sidecar: uploads blocks to object storage for long-term retention. thanos: objectStorageConfig: name: thanos-objstore-config key: objstore.yml
# Sample limit: max series per scrape target. # Safety net against high-cardinality targets. serviceMonitorSelectorNilUsesHelmValues: false podMonitorSelectorNilUsesHelmValues: false", "output": "Prometheus configured with 15-day retention, 50GB size limit, fast SSD storage, and Thanos sidecar for long-term block upload to object storage." }, "callout": { "type": "mental_model", "title": "When to Add Thanos or Cortex", "text": "Prometheus is designed for short-term, per-cluster monitoring. It cannot federate queries across clusters, cannot store data longer than a few weeks efficiently, and is a single point of failure. Thanos and Cortex solve these problems.", "hook": "Add Thanos when you need: cross-cluster query federation, retention beyond 30 days, or HA for Prometheus.", "bullets": [ "Prometheus: Single cluster, short-term (days to weeks). No native HA or federation.", "Thanos: Sidecar uploads blocks to S3/GCS. Querier federates across Prometheus instances. Compactor reduces storage costs. Best for multi-cluster with object storage.", "Cortex: Hor Cloud. Best for multi-tenant SaaS platforms.", "VictoriaMetrics: Drop-in Prometheus replacement with better compression and lower resource usage. Best for single-cluster with high cardinality.", "Decision: Use Thanos for multi-cluster with object storage. Use Cortex for multi-tenant SaaS. Use VictoriaMetrics for single-cluster resource optimization." ] }, "production_insight": "Prometheus's memory usage is directly proportional to the number of active time series in the TSDB head block. Each active series consumes approximately 16KB of memory. If you have 2 million active series, you need approximately 32GB of RAM for the head block alone, plus overhead for queries and compaction. Monitor prometheus_tsdb_head_series and process_resident_memory_bytes. Set retention based on disk capacity: 15 days at 1M series with 15s scrape interval equals approximately 50GB of disk. Use fast SSDs for TSDB — network-attached storage introduces latency that slows compaction and can cause WAL corruption during power loss.", "decision_tree": { "title": "Prometheus Storage Decision Tree", "items": [ { "condition": "Single cluster, retention under 15 days, under 5M active series", "result": "Standalone Prometheus with local TSDB and PVC on fast SSD. No additional components needed." }, { "condition": "Multiple clusters, need cross-cluster query federation", "result": "Deploy Thanos Sidecar on each Prometheus. Add Thanos Querier for global view. Add Thanos Store Gateway for historical queries." }, { "condition": "Retention beyond 30 days or more than 5M active series", "result": "Add Thanos Sidecar with object storage (S3/GCS). Add Thanos Compactor to reduce storage costs. Keep local retention short (7d) and rely on object storage for long-term." }, { "condition": "Multi-tenant SaaS platform with per-customer isolation", "result": "Use Cortex or Grafana Mimir for horizontal scalability and per-tenant resource limits." }, { "condition": "Single cluster with extreme cardinality (10M+ series)", "result": "Consider VictoriaMetrics as a drop-in replacement. It offers better compression (1izontally scalable multi-tenant Prometheus backend. Used by Grafana0x) and lower memory usage than Prometheus TSDB." } ] }, "key_takeaway": "Prometheus TSDB is a block-based storage engine with an in-memory head block. Memory is proportional to active series count. Use fast SSDs for TSDB. Add Thanos for cross-cluster federation and long-term retention beyond local disk capacity. Plan capacity based on series count: 1M series equals approximately 16GB RAM and approximately 50GB disk for 15 days." } ]
Prometheus OOMKill from High-Cardinality Label Explosion
user_id label to track per-user request counts. With 50,000 unique users per hour and 5 label combinations (method, endpoint, status_code, user_id), the metric generated 50,000 * 5 = 250,000 new time series per hour. Each time series consumes memory in Prometheus's TSDB head block. After 6 hours, the head block contained over 1.5 million active series for a single metric, consuming 24GB of RAM. The Prometheus pod was configured with a 16GB memory limit and was OOMKilled.user_id label from the counter immediately and redeployed the application.
2. Added a Prometheus recording rule to aggregate by user tier (free, premium, enterprise) instead of individual user_id.
3. Added sample_limit: 1000 to the scrape config to prevent future label explosions from a single target.
4. Deployed a cardinality-linter CI check that rejects metrics with more than 3 labels in code review.
5. Added a Prometheus alert on prometheus_tsdb_head_series > 1000000 to catch future explosions early.- High-cardinality labels (user_id, request_id, trace_id) will destroy Prometheus. Never add unbounded values as label values.
- Each unique combination of label values creates a new time series. 5 labels with 10 values each = 100,000 series per metric name.
- Set
sample_limiton scrape configs as targets that expose too many series. - Monitor Prometheus's own metrics:
prometheus_tsdb_head_series,prometheus_tsdb_head_chunks, and memory usage. Alert before OOMKill. - Enforce cardinality limits in CI/CD. A single bad label can take down monitoring for the entire cluster.
prometheus.io/scrape annotation is set. Check NetworkPolicy — Prometheus must be able to reach the target pod's IP a safety net. It drops.for duration — the alert may be in PENDING state. Check Alertmanager routing — the alert may be firing but silenced or routed to the wrong receiver.prometheus_tsdb_head_series — if over 5M, you have a cardinality problem. Run promtool tsdb analyze to find the highest-cardinality metric names. Look for labels with unbounded values (user_id, request_id).That's Kubernetes. Mark it forged?
14 min read · try the examples if you haven't