0% found this document useful (0 votes)
9 views

Alert Guidelines and Management

Uploaded by

dine2k
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Alert Guidelines and Management

Uploaded by

dine2k
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Alerting Guidelines

The key objective of creating an Alert is for the engineering organization to be able
to quickly identify and resolve the issues before it becomes an incident.
Any alert created must cover two aspects to answer the question why we are
creating an Alert.
1) Relevance to the business metrics that are being monitored.
2) What should be the intended action when someone is notified.

Apart from the Alerts based on the Golden Signal (Volume, Availability, Latency and
Errors). The team may want to measure something more granular in terms of
specified flow or functionality.

Every Alert should be actionable and relevant. It is generally a good practice to


increase the priority as the subsequent threshold breaches.
Codification of Alert helps to manage the Alert life cycle better.

How to Approach
1) Review all the business health checks the individual team runs to understand
the business flow that needs to be monitored to identify alerting needs.
2) Look at places in the code that have explicit email/slack notifications and
review the need.
a. Enhance newrelic logs to configure new relic alerts.
3) Create Opsgenie Listener per team for email, New Relic, Azure integrations.
4) Define standard templates for each listener integration so that the service
name and other details are transmitted in a standard format.

Standard templates for email


Subject
From Name
From Address
Conversation Subject
Message

Standard templates for New Relic


This will be available via terraform modules.
https://support.atlassian.com/opsgenie/docs/integrate-opsgenie-with-new-relic-
alerts-new/

Define the Payload Template –


{
"tags": "tag1,tag2",
"teams": "team1,team2",
"recipients": "user1,user2",
"payload": {
"condition_id": {{json accumulations.conditionFamilyId.[0]}},
"condition_name": {{json accumulations.conditionName.[0] }},
"current_state": {{#if issueClosedAtUtc}} "closed" {{else if
issueAcknowledgedAt}} "acknowledged" {{else}} "open"{{/if}},
"details": {{json issueTitle}},
"event_type": "Incident",
"incident_acknowledge_url": {{json issueAckUrl }},
"incident_api_url": "N/A",
"incident_id": {{json issueId }},
"incident_url": {{json issuePageUrl }},
"owner": "N/A",
"policy_name": {{ json accumulations.policyName.[0] }},
"policy_url": {{json issuePageUrl }},
"runbook_url": {{ json accumulations.runbookUrl.[0] }},
"severity": {{#eq "HIGH" priority}} "WARNING" {{else}}{{json priority}}
{{/eq}},
"targets": {
"id": {{ json entitiesData.entities.[0].id }},
"name": {{ json entitiesData.entities.[0].name }},
"type": "{{entitiesData.entities.[0].type }}",
"product": "{{accumulations.conditionProduct.[0]}}"
},
"timestamp": {{#if closedAt}} {{ closedAt }} {{else if
acknowledgedAt}} {{ acknowledgedAt }} {{else}} {{ createdAt }} {{/if}}
}
}

Standard templates for Azure integration


This will be available via terraform modules.
https://support.atlassian.com/opsgenie/docs/integrate-opsgenie-with-microsoft-
azure/
Current Application and Tools

Opsgenie
Application Type Integration
SkyBot email
Web email, New Relic
API email, New Relic
Windows Services email, New Relic
Console email, New Relic
email, New Relic,
Azure Batch Azure
email, New Relic,
Azure Functions Azure
Azure Message email, New Relic,
Queues Azure

You might also like