Observe together with dashboards as code examples utilizing New Relic and Terraform. To learn this full New Relic weblog, click here.
Observability as code (also referred to as o11y as code) is the method of automating the configuration of observability instruments. You handle your infrastructure with code, so why not handle your observability the identical approach—and your dashboards as effectively?
This three-part weblog sequence is your information to o11y as code, offering suggestions, examples, and steerage. On this sequence, we’ll stroll by examples of how one can automate the configuration of your observability instruments, beginning with dashboards right here partially one. Half one covers the fundamentals of Terraform, find out how to provision a pattern app, and find out how to create dashboards as code.
By the tip of the sequence, you may have labored with a complete of 5 examples of observability as code utilizing New Relic and Hashicorp’s Terraform: dashboards as code, alerts as code, artificial monitoring as code, tags as code, and workloads as code. You may be working with knowledge from the sample FoodMe restaurant ordering app.
How did we get right here? Infrastructure as code
Since infrastructure as code (also referred to as IaC) appeared on the scene greater than a decade in the past, it’s grow to be a core requirement within the trendy cloud period. The terminology “as code” means treating infrastructure configuration similar to we deal with code, pushing configuration into supply management, then rigorously pushing out adjustments once more to the infrastructure layer.
With the rise of contemporary distributed methods, we additionally see extra outages, and discovering the foundation reason for the difficulty could be difficult when one thing goes mistaken. Observability matches into the brand new paradigm as a result of we have to decide the interior states of our methods from their outputs. Observability makes use of totally different system outputs corresponding to tracing, logs, and metrics to grasp the interior state of the distributed parts, diagnose the place the issues are, and get to the foundation trigger.
Sadly, the operational practices we depend on didn’t change a lot, and builders and operations engineers would possibly discover they nonetheless have a look at a whole bunch of alerts or dashboards. This method results in non-repeatable, non-standardized dashboard configurations or adjusting alerts dynamically to keep away from indicators fatigue and drifting from organizational greatest practices.
However we are able to use what we learn about infrastructure as code to automate observability. Meet the brand new method: observability as code, which treats observability configurations as code. As defined in Observability as code simplifies your life, observability as code represents a shift of intention to an auditable code-managed resolution that reduces the work wanted to take care of and develop a configuration.
Perceive the fundamentals of Terraform
Terraform by Hashicorp is an infrastructure as code instrument that you need to use to outline and handle infrastructure sources in configuration information which can be simply readable by people. You’ll be able to declaratively handle providers and automate your adjustments to these providers.
In most examples, a Terraform module is a set of Terraform configuration information in a single listing. Whenever you run Terraform instructions immediately from that single listing, it’s thought of the foundation module. Here is what it seems to be like, as shown in the Terraform docs:
.
├── LICENSE
├── README.md
├── important.tf
├── variables.tf
├── outputs.tf
Terraform information used on this weblog sequence
The examples within the tutorial workout routines on this weblog sequence deal with two essential information:
-
The
important.tf
file accommodates the principle set of configurations in your module. It’s also possible to create different configuration information and manage them in a approach that is sensible in your venture. -
The
variables.tf
file accommodates the variable definitions in your module. If you would like others to make use of your module, configure the variables as arguments within the module block.
Instance of a New Relic Terraform supplier
Right here’s an instance of a New Relic supplier in Terraform from Configuring the New Relic Terraform Provider.
# get the New Relic terraform supplier
terraform {
required_version = "~> 1.0"
required_providers {
newrelic = {
supply = "newrelic/newrelic"
}
}
}
# configure the New Relic supplier
supplier "newrelic" {
account_id = <Your Account ID>
api_key = <Your Consumer API Key> # normally prefixed with 'NRAK'
area = "US" # Legitimate areas are US and EU
}
It’s also possible to use environment variables to configure the supplier
, which might simplify your supplier block. Every supplier has key schema attribute, corresponding to account_id
, api_key
, and area
.
Terraform instructions to recollect
To initialize and run Terraform successfully, bear in mind these 4 instructions:
-
The
terraform init
command performs initialization steps to organize the present working listing to be used with Terraform. This command is secure to run a number of instances, to replace the working listing with configuration adjustments. -
The
terraform plan
command creates an execution plan, which helps you to preview the adjustments that Terraform will make to your infrastructure. You should use this command to examine whether or not the proposed adjustments match what you anticipate earlier than you apply the adjustments. -
The
terraform apply
command routinely creates an execution plan, prompting you to approve that plan, after which takes the indicated actions. Observe the prompts, and reply sure to use the adjustments. Terraform will then provision the sources. -
The
terraform destroy
command is a handy option to take away all of the distant objects managed by a selected Terraform configuration. Observe the prompts, and Terraform will delete all of the sources.
For extra info on Terraform instructions, see Provisioning infrastructure with Terraform.
The examples within the subsequent sections present key ideas in Terraform corresponding to suppliers, knowledge sources, and sources. You may be automating configuration of New Relic dashboards to view knowledge from the sample FoodMe restaurant app.
This weblog put up demo makes use of the newrelic_one_dashboard
useful resource. As a substitute, if you wish to use the newrelic_one_dashboard_json
useful resource, see the Creating dashboards with Terraform and JSON templates tutorial.
Earlier than you start provisioning your first Terraform module
For this tutorial, we’re going to provision a pattern app. However earlier than you provision your first Terraform module, you’ll have to get an account ID, your person key, and level to the right knowledge heart:
This video walkthrough covers prerequisite work.
Automate Configuration with Observability as Code | New Relic
Provision the pattern app
Earlier than we work on implementing observability as code, let’s begin by provisioning our pattern app!
-
Generate your distinctive URL for the FoodMe instance app with this Glitch hyperlink: glitch.com/edit/#!/remix/nr-devrel-o11yascode
-
Set the setting variables. Go to
.env
and insert these values:
-
LICENSE_KEY
: Insert your New Relic ingest API keys. -
APP_NAME
: Insert your identify or initials to the identify of the appFoodMe-XXX
(for instance,FoodMe-Jan
).
- Preview your URI.
Go to Instruments (backside of the panel), and choose Preview in a brand new window.
- Document your URL.
Be aware your newly generated URL. You’ll use this afterward in part two of the sequence for artificial monitoring as code.
- Generate some workloads. Now that you just’re within the pattern app, enter an instance identify, supply deal with, and choose Discover Eating places! After you might be on the principle web page, click on round to generate some workloads for the pattern app. We’ll want some knowledge to have a look at within the dashboards.
Create dashboards as code
Now we’re prepared for our first observability as code instance: dashboards as code. With New Relic customized dashboards, you may gather and visualize the particular knowledge that you just need to see and show in New Relic. You may discover ways to automate configuring dashboards in New Relic utilizing Terraform.
There are three important steps. To see every thing we’re protecting on this part, watch this video. For extra particulars, go to Getting started with New Relic and Terraform. It’s also possible to work together with these steps with code samples in GitHub and the hands-on workshop in Instruqt.
Automate Configuration with Observability as Code | New Relic
In Terraform, every useful resource block describes a number of observability objects, corresponding to dashboards, alerts, notification workflows, or workloads. We’ll use examples from Resource: newrelic_one_dashboard:
- Create a useful resource block and declare a sort (
newrelic_one_dashboard
) with a given identify (exampledash
). The sort and the identify of the useful resource are the identifier for the useful resource, so that they should be distinctive inside a module. Here is a easy instance for deploying dashboards as code in New Relic, based mostly on Resource: newrelic_one_dashboard.
# New Relic One Dashboard
useful resource "newrelic_one_dashboard" "exampledash" {
# The title of the dashboard.
identify = "New Relic Terraform Instance"
# A nested block that describes a web page
web page {
# The identify of the web page.
identify = "New Relic Terraform Instance"
# A nested block that describes a Billboard widget
widget_billboard {
title = "Requests per minute"
row = 1
column = 1
width = 6
peak = 3
# A nested block that describes a NRQL Question
nrql_query {
question = "FROM Transaction SELECT price(depend(*), 1 minute)"
}
}
}
}
For extra particulars on attribute reference, see the attribute reference for the newrelic provider in Terraform.
For extra particulars on New Relic Question Language (NRQL), see syntax, clauses, and functions.
-
Subsequent, you may embody a
variables.tf
file in Terraform. You’ll be able to customise Terraform modules with input variables as an alternative of modifying the supply code of the module. Then it is easy to share and reuse modules throughout different configurations in Terraform. On the finish of this part, you may see an instancevariables.tf
file. -
Lastly, you may mix what we coated about the New Relic provider, the resources, the
important.tf
file, and the correspondingvariariables.tf
file to deploy dashboards as code.
The subsequent two instance important.tf
and variariables.tf
information use ideas described in Google Site Reliability Engineering, The Four Golden Signals: latency, site visitors, errors, and throughput. These examples are based mostly on code samples within the Getting Started with the New Relic Provider documentation.
Instance important.tf file full code
# get the New Relic terraform supplier
terraform {
required_version = "~> 1.0"
required_providers {
newrelic = {
supply = "newrelic/newrelic"
}
}
}
# configure the New Relic supplier
supplier "newrelic" {
account_id = (var.nr_account_id)
api_key = (var.nr_api_key) # normally prefixed with 'NRAK'
area = (var.nr_region) # Legitimate areas are US and EU
}
# useful resource to create, replace, and delete dashboards in New Relic
useful resource "newrelic_one_dashboard" "dashboard_name" {
identify = "O11y_asCode-FoodMe-Dashboards-TF"
# determines who can see the dashboard in an account
permissions = "public_read_only"
web page {
identify = "Dashboards as Code"
widget_markdown {
title = "Golden Indicators - Latency"
row = 1
column = 1
width = 4
peak = 3
textual content = "## The 4 Golden Indicators - Latencyn---n#### The time it takes to service a request. It’s essential to tell apart between the latency of profitable requests and the latency of failed requests. nn#### For instance, an HTTP 500 error triggered attributable to lack of connection to a database or different important backend could be served in a short time; nevertheless, as an HTTP 500 error signifies a failed request, factoring 500s into your general latency would possibly end in deceptive calculations. nn#### Alternatively, a gradual error is even worse than a quick error! Subsequently, it’s essential to trace error latency, versus simply filtering out errors."
}
widget_line {
title = "Golden Indicators - Latency - FoodMe - Line"
row = 1
column = 5
width = 4
peak = 3
nrql_query {
question = "SELECT common(apm.service.overview.net) * 1000 as 'Latency' FROM Metric WHERE appName like '%FoodMe%' since half-hour in the past TIMESERIES AUTO"
}
}
widget_stacked_bar {
title = "Golden Indicators - Latency - FoodMe - Stacked Bar"
row = 1
column = 9
width = 4
peak = 3
nrql_query {
question = "SELECT common(apm.service.overview.net) * 1000 as 'Latency' FROM Metric WHERE appName like '%FoodMe%' since half-hour in the past TIMESERIES AUTO"
}
}
widget_markdown {
title = "Golden Indicators - Errors"
row = 4
column = 1
width = 4
peak = 3
textual content = "## The 4 Golden Indicators - Errorsn---nn#### The speed of requests that fail, both explicitly (e.g., HTTP 500s), implicitly (for instance, an HTTP 200 success response, however coupled with the mistaken content material), or by coverage (for instance, "In case you dedicated to one-second response instances, any request over one second is an error").n n#### The place protocol response codes are inadequate to precise all failure circumstances, secondary (inside) protocols could also be mandatory to trace partial failure modes. nn#### Monitoring these circumstances could be drastically totally different: catching HTTP 500s at your load balancer can do a good job of catching all fully failed requests, whereas solely end-to-end system assessments can detect that you just’re serving the mistaken content material."
}
widget_area {
title = "Golden Indicators - Errors - FoodMe - Space"
row = 4
column = 5
width = 4
peak = 3
nrql_query {
question = "SELECT (depend(apm.service.error.depend) / depend(apm.service.transaction.length))*100 as 'Errors' FROM Metric WHERE (appName like '%FoodMe%') AND (transactionType="Internet") SINCE half-hour in the past TIMESERIES AUTO"
}
}
widget_billboard {
title = "Golden Indicators - Errors - FoodMe - Billboard Evaluate With"
row = 4
column = 9
width = 4
peak = 3
nrql_query {
question = "SELECT (depend(apm.service.error.depend) / depend(apm.service.transaction.length))*100 as 'Errors' FROM Metric WHERE (appName like '%FoodMe%') AND (transactionType="Internet") SINCE half-hour in the past COMPARE WITH half-hour in the past"
}
}
widget_markdown {
title = "Golden Indicators - Visitors"
row = 7
column = 1
width = 4
peak = 3
textual content = "## The 4 Golden Indicators - Trafficn---nn#### A measure of how a lot demand is being positioned in your system, measured in a high-level system-specific metric. nn#### For an internet service, this measurement is normally HTTP requests per second, maybe damaged out by the character of the requests (e.g., static versus dynamic content material). nn#### For an audio streaming system, this measurement would possibly deal with community I/O price or concurrent periods. nn#### For a key-value storage system, this measurement could be transactions and retrievals per second."
}
widget_table {
title = "Golden Indicators - Visitors - FoodMe - Desk"
row = 7
column = 5
width = 4
peak = 3
nrql_query {
question = "SELECT price(depend(apm.service.transaction.length), 1 minute) as 'Visitors' FROM Metric WHERE (appName LIKE '%FoodMe%') AND (transactionType="Internet") FACET path SINCE half-hour in the past"
}
}
widget_pie {
title = "Golden Indicators - Visitors - FoodMe - Pie"
row = 7
column = 9
width = 4
peak = 3
nrql_query {
question = "SELECT price(depend(apm.service.transaction.length), 1 minute) as 'Visitors' FROM Metric WHERE (appName LIKE '%FoodMe%') AND (transactionType="Internet") FACET path SINCE half-hour in the past"
}
}
widget_markdown {
title = "Golden Indicators - Saturation"
row = 10
column = 1
width = 4
peak = 3
textual content = "## The 4 Golden Indicators - Saturationn---nn#### How "full" your service is. A measure of your system fraction, emphasizing the sources which can be most constrained (e.g., in a memory-constrained system, present reminiscence; in an I/O-constrained system, present I/O). Be aware that many methods degrade in efficiency earlier than they obtain 100% utilization, so having a utilization goal is important.nn#### In advanced methods, saturation could be supplemented with higher-level load measurement: can your service correctly deal with double the site visitors, deal with solely 10% extra site visitors, or deal with even much less site visitors than it presently receives? For quite simple providers that haven't any parameters that alter the complexity of the request (e.g., "Give me a nonce" or "I would like a globally distinctive monotonic integer") that not often change configuration, a static worth from a load check could be enough. nn#### As mentioned within the earlier paragraph, nevertheless, most providers want to make use of oblique indicators like CPU utilization or community bandwidth which have a recognized higher sure. Latency will increase are sometimes a number one indicator of saturation. Measuring your 99th percentile response time over some small window (e.g., one minute) may give a really early sign of saturation.nn#### Lastly, saturation can also be involved with predictions of impending saturation, corresponding to "It seems to be like your database will fill its laborious drive in 4 hours.""
}
widget_line {
title = "Golden Indicators - Saturation - CPU & Reminiscence - Multi-Queries"
row = 10
column = 5
width = 4
peak = 3
nrql_query {
question = "SELECT price(sum(apm.service.cpu.usertime.utilization), 1 second) * 100 as 'cpuUsed' FROM Metric WHERE appName LIKE '%FoodMe%' SINCE half-hour in the past TIMESERIES AUTO"
}
nrql_query {
question = "SELECT common(apm.service.reminiscence.bodily) * price(depend(apm.service.occasion.depend), 1 minute) / 1000 as 'memoryUsed %' FROM Metric WHERE appName LIKE '%FoodMe%' SINCE half-hour in the past TIMESERIES AUTO"
}
}
widget_line {
title = "Golden Indicators - Saturation - Reminiscence - Line Evaluate With"
row = 10
column = 9
width = 4
peak = 3
nrql_query {
question = "SELECT common(apm.service.reminiscence.bodily) * price(depend(apm.service.occasion.depend), 1 minute) / 1000 as 'memoryUsed %' FROM Metric WHERE appName LIKE '%FoodMe%' SINCE half-hour in the past COMPARE WITH 20 minutes in the past TIMESERIES AUTO"
}
}
}
}
Instance variables.tf file full code
# your distinctive New Relic account ID
variable "nr_account_id" {
default = "XXXXX"
}
# your Consumer API key
variable "nr_api_key" {
default = "XXXXX"
}
# legitimate areas are US and EU
variable "nr_region" {
default = "US"
}
What the ultimate consequence seems to be like
Now that you’ve got deployed dashboards as code, your remaining consequence ought to appear to be this in New Relic:
Observe together with dashboards as code examples utilizing New Relic and Terraform. To learn this full New Relic weblog, click here.