APM

Alert policies

 

Learn how to:

  • Use alert policies in your environment.
  • Identify components that make up an alert policy.
  • Create an alert policy and set incident preferences.

Transcript:

New Relic Alerting is a flexible, centralized alerting system that allows you to create and manage alert policies that fit your environment, and focus on the metrics you care about most, in a simple and intuitive way. This tutorial will cover how an alert policy works, the components that make up an alert policy, and how to create an alert policy and set incident preferences.

An alert policy is a container for conditions. Each condition has a critical and warning threshold level for a specific part of an app, such as Apdex, error rate, or a specific key transaction, to name just a few examples. When an app's performance dips below a condition's minimum threshold settings, it will trigger an alert. By setting meaningful thresholds for your app in each condition, your policies will trigger alerts, so you know when you need to get in and fix things before more serious problems arise.

To create and view alert policies for your app from your APM account, click on the "alerts" tab in the top menu bar. If you haven't ever used the alerting feature for your account, you will see this welcome page, and a step by step guide to walk you through creating a policy. However, since these only appear when you first use Alerting, I'm gonna walk you through this process without using these guides.

The default starting page for alerting is "open incidents". Right now, I don't have any incidents, so we see this page telling us to start with alert policies. So I'll go ahead and click on the alert policies tab in the "alerting" submenu at the top, which takes me to a list view of all alert policies associated with my account.

Below the alerting submenu on the right is a plus icon and a link to create a new alert policy. Before we create a new policy, let's start by opening an existing one to see what elements make up a policy, and how alert conditions are set up.

At the very top is the name of the policy, and to the right of that is an icon to edit the policy preferences, which allows you to specify how incidents should be created when conditions in the alert policy are violated. I'll go over these settings later.

Next to that is an icon to delete the policy. Below this are two tabs to view the policy's conditions and notification channels. We'll take a look at the conditions first.

In the conditions tab we see a search bar to help find a specific condition, and to the right of that is an icon to add a condition to the policy. Below this is a list of conditions, or set of performance criteria for a target. Below this is the number of targets, such as the transaction or app that is associated with the condition. You can click on this to open a panel to see these targets.

Each condition has a critical threshold indicated by the red hexagonal icon, and an optional warning level indicated by the yellow triangular icon. Each threshold has a short description of its settings. Anytime a target's performance dips below its minimum performance thresholds, a violation occurs, and an alert is sent. Remember, alert notifications are only sent out in incidents created for critical violations via the notifications channel configured in the policy. We'll go over how to set up notification channels in the later alerting tutorial. If no critical violation occurs to trigger an alert and create an incident, you can view any warning violations under the events tab and use them to proactively address potential performance problems.

Let's try creating a policy now. First, I'll click back to the alert policies list view, and then I'll click on the "new alert policy" link in the upper right corner of the policy list view page to open a text field where I can give the policy a name.

For this example, let's say I need to create an alert policy for the New Relic University tutorials page, which is using browser monitoring. It's important to give alert polices unique and descriptive names, so its purpose and scope are clear. A great way to do this is to name policies based on who will be responsible for responding to alerts.

Let's say this new policy I'm creating will be the responsibility of the New Relic University team and will focus on error rate, so give this policy the name "NRU Team Error Rate". I'll save my new policy and then click on it to open it. Right now it doesn't have any conditions, so the first thing I'll do is create one.

There are three steps to creating a condition. First, categorizing the condition; then, selecting targets; and finally, defining thresholds.

First, we'll select a category of product from the list. For this policy, I want Browser. Next, we'll select the metric type. The type changes based on the product selection; in this instance, metric is my only option.

Next, we'll select a target for this policy. In order to select the target, first I'll choose the application I want to receive alerts for. I can search for them using the search bar here, or I could go to the specific product dashboard and use the APM drill-down features to find things like specific transactions.

Finally, I'll define thresholds for this policy. To define a threshold, I click the "When target application" metric drop down. This will change based on the product and application I'm setting thresholds for. Since I'm tracking errors, I'll just use the "page views with JavaScript errors" under "other metrics." Then, I'll need to set the threshold as "above", "below", or "equal to".

In my case, I want to trigger an alert if the error rate is higher than I think is acceptable for this page, so I'll set the critical threshold as "has the percentage above twenty percent for at least five minutes". And because I think there are some performance improvements I can make, I'm going to set a lower warning threshold, so I can keep a closer eye on this. I'll make the warning threshold activate when the app has a percentage above ten percent at least once in one hour.

I also have the option to give this condition a custom name. For this example, the default suggestion is "Page views with JavaScript errors (High)", which I think is pretty accurate, so I'll leave that. If you do decide to give a condition a custom name, it's important to think about what the condition represents, so it's clear to others what the condition is doing.

Optionally, you can set up a Runbook URL. Runbook is an open source monitoring service that allows you to perform automated reactions when issues are detected. For alerting purposes, this is helpful in documenting and troubleshooting issues. For more information on how to use this with New Relic alerting, visit the docs site. You can also find out more about Runbook at runbook.io.

Finally, I'll click create condition. Before we leave this policy page, I want to show you one more setting you can configure. Because the purpose of an alert policy is to send you an alert and create an incident when its conditions are violated, you can change your preferences for how incidents are created and violation events are grouped. Incident preferences can be found in the upper right corner on each policy page. The default setting is "by policy"; this means only one open incident will be created at a time for this alert policy, and all violations will show up in that single incident while it's open.

If I set it to "by condition," an open incident will be created for each individual condition within a policy. If the policy has multiple conditions, I will see a separate incident for each condition. This will result in more frequent notifications for much smaller incidents, and can be helpful if I want an individual incident record to focus on a specific condition. And if I set it to "by condition and target," an incident will be created for each condition and target assigned to the policy, and I will see an incident for every violation that occurs within the policy. This option is the most granular level for creating incident records and can be useful if I want to closely monitor anything occurring across my app's entire infrastructure.

You can visit the docs site for "reviewing alerting incidents" for a more detailed description of these settings. In this case, I'm going to leave my preferences on the default, "by policy," since I only want to look at error rate violations in a single incident. At any point, I can go back to this policy and add more conditions by simply clicking the "add a condition" icon in the upper right corner.

Now that you know how to create alert policies, you're ready to set up alerts within your app, so you can respond to problems more quickly and effectively to improve your app's performance. In the next tutorial, we'll cover the lifecycle of an incident.