Alert Guidelines and Management
Alert Guidelines and Management
The key objective of creating an Alert is for the engineering organization to be able
to quickly identify and resolve the issues before it becomes an incident.
Any alert created must cover two aspects to answer the question why we are
creating an Alert.
1) Relevance to the business metrics that are being monitored.
2) What should be the intended action when someone is notified.
Apart from the Alerts based on the Golden Signal (Volume, Availability, Latency and
Errors). The team may want to measure something more granular in terms of
specified flow or functionality.
How to Approach
1) Review all the business health checks the individual team runs to understand
the business flow that needs to be monitored to identify alerting needs.
2) Look at places in the code that have explicit email/slack notifications and
review the need.
a. Enhance newrelic logs to configure new relic alerts.
3) Create Opsgenie Listener per team for email, New Relic, Azure integrations.
4) Define standard templates for each listener integration so that the service
name and other details are transmitted in a standard format.
Opsgenie
Application Type Integration
SkyBot email
Web email, New Relic
API email, New Relic
Windows Services email, New Relic
Console email, New Relic
email, New Relic,
Azure Batch Azure
email, New Relic,
Azure Functions Azure
Azure Message email, New Relic,
Queues Azure