Menu
Grafana Cloud Enterprise Open source

Handling missing data in Grafana Alerting

Missing data, or when a target stops reporting metric data, is one of the most common issues when troubleshooting alerts. In cloud-native environments, this happens all the time โ€” pods or nodes scale down to match demand, or an entire job quietly disappears.

When this happens, alerts wonโ€™t fire, and you might not notice the system has stopped reporting.

Sometimes it’s just a lack of data from a few instances. Other times, it’s a connectivity issue where the entire target is unreachable.

This guide covers different scenarios where the underlying data is missing and how to design your alerts to act on those cases. If you’re troubleshooting an unreachable host or a network failure, see Handling connectivity errors as well.

No Data vs. Missing Series

There are a few common causes when an instance stops reporting data, similar to connectivity errors:

  • Host crash: The system is down, and Prometheus stops scraping the target.
  • Temporary network failures: Intermittent scrape failures cause data gaps.
  • Deployment changes: Decommissioning, Kubernetes pod eviction, or scaling down resources.
  • Ephemeral workloads: Metrics intentionally stop reporting.
  • And more.

The first thing to understand is the difference between a query failure (or connectivity error), No Data, and a Missing Series.

Alert queries often return multiple time series โ€” one per instance, pod, region, or label combination. This is known as a multi-dimensional alert, meaning a single alert rule can trigger multiple alert instances (alerts).

For example, imagine a recorded metric, http_request_latency_seconds, that reports latency per second in the regions where the application is deployed. The query returns one series per region โ€” for instance, region1 and region2 โ€” and generates only two alert instances. In this scenario, you may experience:

  • Connectivity Error if the alert rule query fails.
  • No Data if the query runs successfully but returns no data at all.
  • Missing Series if one or more specific series, which previously returned data, are missing, but other series still return data.

In both No Data and Missing Series cases, the query still technically “works”, but the alert wonโ€™t fire unless you explicitly configure it to handle these situations.

Letโ€™s walk through both scenarios using the previous example, with an alert that triggers if the latency exceeds 2 seconds in any region: avg_over_time(http_request_latency_seconds[5m]) > 2

No Data Scenario: The query returns no data for any series:

Timeregion1region2Alert triggered
00:001.5s ๐ŸŸข1s ๐ŸŸขโœ… No Alert
01:00No Data โš ๏ธNo Data โš ๏ธโš ๏ธ No Alert (Silent Failure)
02:00No Data โš ๏ธNo Data โš ๏ธโš ๏ธ No Alert (Silent Failure)
03:001.4s ๐ŸŸข1s ๐ŸŸขโœ… No Alert

MissingSeries Scenario: Only a specific series (region2) disappears:

Timeregion1region2Alert triggered
00:001.5s ๐ŸŸข1s ๐ŸŸขโœ… No Alert
01:001.6s ๐ŸŸขMissing Series โš ๏ธโš ๏ธ No Alert (Silent Failure)
02:001.6s ๐ŸŸขMissing Series โš ๏ธโš ๏ธ No Alert (Silent Failure)
03:001.4s ๐ŸŸข1s ๐ŸŸขโœ… No Alert

In both cases, something broke silently.

Detecting missing data in Prometheus

Prometheus doesn’t fire alerts when the query returns no data. It simply assumes there was nothing to report, like with query errors. Missing data wonโ€™t trigger existing alerts unless you explicitly check for it.

In Prometheus, a common way to catch missing data is by using the absent_over_time function.

absent_over_time(http_request_latency_seconds[5m]) == 1

This triggers when all series for http_request_latency_seconds are absent for 5 minutes โ€” catching the No Data case when the entire metric disappears.

However, absent_over_time() canโ€™t detect which specific series are missing since it doesnโ€™t preserve labels. The alert wonโ€™t tell you which series stopped reporting โ€” only that the query returns no data.

If you want to check for missing data per-region or label, you can specify the label in the alert query as follows:

absent_over_time(http_request_latency_seconds{region="region1"}[5m]) == 1
or
absent_over_time(http_request_latency_seconds{region="region2"}[5m]) == 1

But this doesn’t scale well. Hardcoding queries for each label set is fragile, especially in dynamic cloud environments where instances can appear or disappear at any time.

Handling No Data in Grafana alerts

While Prometheus provides functions like absent_over_time() to detect missing data, not all data sources โ€” like Graphite, InfluxDB, PostgreSQL, and others โ€” available to Grafana alerts support a similar function.

To handle this, Grafana Alerting implements a built-in No Data state logic, so you donโ€™t need to detect missing data with absent_* queries. Instead, you can configure in the alert rule settings how alerts behave when no data is returned.

Similar to error handling, Grafana by default triggers a special No data alert and lets you control this behavior. In Configure no data and error handling, click Alert state if no data or all values are null, and choose one of the following options:

  • No Data (default): Triggers a new DatasourceNoData alert, treating No data as a specific problem.

  • Alerting: Transition each existing alert instance into the Alerting state when data disappears.

  • Normal: Ignores missing data and transitions all instances to the Normal state. Useful when receiving intermittent data, such as from experimental services, sporadic actions, or periodic reports.

  • Keep Last State: Leaves the alert in its previous state until the data returns. This is common in environments where brief metric gaps happen regularly, like with flaky exporters or noisy environments.

    A screenshot of the `Configure no data handling` option in Grafana Alerting.

Handling DatasourceNoData notifications

When Grafana triggers a NoData alert, it creates a distinct alert instance, separate from the original alert instance. These alerts behave differently:

  • They use a dedicated alertname: DatasourceNoData.
  • They donโ€™t inherit all the labels from the original alert instances.
  • They trigger immediately, ignoring the pending period.

Because of this, DatasourceNoData alerts might require a dedicated setup to handle their notifications. For general recommendations, see Reduce redundant DatasourceError alerts โ€” similar practices can apply to NoData alerts.

Evicting alert instances for missing series

MissingSeries occurs when only some series disappear but not all. This case is subtle โ€” but important.

Grafana marks missing series as stale after two evaluation intervals and triggers the alert instance eviction process. Hereโ€™s what happens under the hood:

  • Alert instances with missing data keep their last state for two evaluation intervals.
  • If still missing after that:
    • Grafana adds the annotation grafana_state_reason: MissingSeries.
    • The alert instance transitions to the Normal state.
    • A resolved notification is sent if the alert was previously firing.
    • The alert instance is removed from the Grafana UI.

If an alert instance becomes stale, youโ€™ll find in the alert history as Normal (Missing Series) before it disappears. This table shows the eviction process from the previous example:

Timeregion1region2Alert triggered
00:001.5s ๐ŸŸข1s ๐ŸŸข๐ŸŸข๐ŸŸข No Alerts
01:003s ๐Ÿ”ด
Alerting
3s ๐Ÿ”ด
Alerting
๐Ÿ”ด๐Ÿ”ด Alert instances triggered for both regions
02:001.6s ๐ŸŸขMissingSeries โš ๏ธ
Alerting ๏ธ
๐ŸŸข๐Ÿ”ด Region2 missing, state maintained.
03:001.6s ๐ŸŸขMissingSeries โš ๏ธ Alerting๏ธ๐ŸŸข๐Ÿ”ดRegion2 missing, state maintained.
04:001.4s ๐ŸŸขโ€”๐ŸŸข ๐ŸŸข region2 Normal (Missing Series), resolved, and instance evicted; ๐Ÿ“ฉ Notification sent.
05:001.4s ๐ŸŸขโ€”๐ŸŸข No Alerts

Why doesnโ€™t MissingSeries match No Data behavior?

In dynamic environments โ€” autoscaling groups, ephemeral pods, spot instances โ€” series naturally come and go. MissingSeries normally signals infrastructure or deployment changes.

By default, No Data triggers an alert to indicate a potential problem.

The eviction process for MissingSeries is designed to prevent alert flapping when a pod or instance disappears, reducing alert noise.

In environments with frequent scale events, prioritize symptom-based alerts over individual infrastructure signals and use aggregate alerts unless you explicitly need to track individual instances.

Handling MissingSeries notifications

A stale alert instance triggers a resolved notification if it transitions from a firing state (such as Alerting, No Data, or Error) to Normal.

You can display the MissingSeries annotation in notifications to indicate the alert wasnโ€™t resolved by recovery but evicted due to series data going missing.

Review these notifications to confirm whether something broke or if the alert was unnecessary. To reduce noise:

  • Silence or mute alerts during planned maintenance or rollouts.
  • Adjust alert rules to avoid triggering on series you expect to come and go, and use aggregated alerts instead.

Wrapping up

Missing data isnโ€™t always a failure. Itโ€™s a common scenario in dynamic environments when certain targets stop reporting.

Grafana Alerting handles distinct scenarios automatically. Hereโ€™s how to think about it:

  • Use Grafanaโ€™s No Data handling options to define what happens when a query returns nothing.
  • Understand DatasourceNoData and MissingSeries notifications, since they donโ€™t behave like regular alerts.
  • Use absent() or absent_over_time() in Prometheus for fine-grained detection when a metric or label disappears entirely.
  • Donโ€™t alert on every instance by default. In dynamic environments, itโ€™s better to aggregate and alert on symptoms โ€” unless a missing individual instance directly impacts users.
  • If youโ€™re getting too much noise from disappearing data, consider adjusting alerts, using Keep Last State, or routing those alerts differently.
  • For connectivity issues involving alert query failures, see the sibling guide: Handling connectivity errors in Grafana Alerting.