Organizing Your Graphite Metrics

2012-05-09 22:13:10 by jdixon

One of the most common questions I get from Graphite users is how best to name and/or organize metric paths. I don't have an exhaustive list of "best practices" but I'd like to share some basic insights I've accumulated.

Misaligned paths are ok. I used to be tempted to try and keep different paths aligned in order to ease correlation of related targets within a graph. Fortunately there are plenty of helpful aliasing functions (and wildcards) to help tame unruly paths.

Read the rest of this story...

The Story Behind Tasseo

2012-05-07 10:19:32 by jdixon

A little over a week ago I released the Tasseo dashboard. The response I got back was nothing short of astonishing. Tasseo is a Graphite dashboard, one of many to have been released in recent months. That fact alone led me to believe it would fly quietly under the radar. I couldn't have been more wrong; Tasseo (pronounced like Casio) tallied over 200 GitHub watchers in the first weekend, and should pass 300 today.

Tasseo was originally developed as a from-the-ground-up reimplementation of the Pulse dashboard we use at Heroku. Pulse has been a tremendously valuable tool for us; unfortunately, it has some drawbacks that make it a challenge to maintain.

Read the rest of this story...

A Precautionary Tale for Graphite Users

2012-05-02 22:09:36 by jdixon

This morning I was collecting some graphs for one of our weekly status meetings. Asked to find something that represented the state of our Graphite system, I naturally gravitated to my usual standbys, "Carbon_Performance" (top) and "Carbon_Inbound_Bandwidth" (bottom).

1-day1-week

The SysAdmin in me loves these because they highlight resource utilization on the server. While the former details disk I/O and CPU, the latter tracks inbound bandwidth in terms of bits and packets per-second. Although the network graph seems utterly boring (in as much as we've all used these in one form of another, from vendor-supplied dashboards to Cacti installations), it's this one that is actually the more complicated of the two to configure.

Read the rest of this story...

Unhelpful Graphite Tip #10

2012-04-25 08:44:44 by jdixon

Let's say you want to compare how a particular metric compares to some point in the past. This is a common practice in troubleshooting and capacity planning. What's the best way to achieve this in Graphite?

I might start off by selecting the past four weeks and visually discern the trends from week to week. Here's a graph showing the last month of AMQP activity. We can see that traffic was oscillating quite a bit over the first week and a half before smoothing out and gradually trending downward.

Read the rest of this story...

Unhelpful Graphite Tip #9

2012-04-19 08:24:20 by jdixon

I love that Graphite can support per-second resolution. We've started to use it more frequently with applications that emit a constant stream of metrics to one of our aggregators. But there are times when an application might send updates less frequently, or when transient failures or network congestion result in lost metrics. In this case it makes sense to adjust your xFilesFactor value.

You may remember my last post that mentioned the whisper-info.py utility. It helps you extract metadata from your whisper files. Take for example, a whisper file for one of our collectd metrics:

$ sudo whisper-info.py /data/whisper/collectd/63694/swap/used.wsp

maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 534580

Archive 0
retention: 86400
secondsPerPoint: 60
points: 1440
size: 17280
offset: 52

...

Read the rest of this story...

Unhelpful Graphite Tip #8

2012-04-18 10:59:38 by jdixon

If you've mucked around with your Whisper storage policies or needed to migrate your data to/from Graphite, there's a good chance you've used some of the bin scripts like whisper-info.py and whisper-fetch.py. Unfortunately there are some drawbacks with whisper-fetch.py, most notably that it only fetches content from the first archive to match the requested time period, and it won't return the original raw data after the rollup policies take effect.

Enter whisper-dump.py, a contributed script from Amos Shapira (@amosshapira). This tool dumps the original data for all archives (storage retention periods) as well as the metadata normally retrieved using whisper-info.py.

$ whisper-dump.py elapsed.wsp > dump

$ head -25 dump
Meta data:
  aggregation method: average
  max retention: 31536000
  xFilesFactor: 0.5

Archive 0 info:
  offset: 40
  seconds per point: 1
  points: 300
  retention: 300
  size: 3600

Archive 1 info:
  offset: 3640
  seconds per point: 60
  points: 525600
  retention: 31536000
  size: 6307200

Archive 0 data:
0: 1334532666, 0.0271825
1: 1334532367, 0.0139948
2: 1334532668, 0.0107801
3: 1334532669, 0.0124356
4: 1334531470, 0.0185704

Unhelpful Graphite Tip #7

2012-04-15 19:01:26 by jdixon

If you're logged into Graphite as an authenticated user you have the option of saving graphs, which will appear under the "My Graphs" folder in the navigation tree to the left. There are some limitations (you can't include spaces in the filename) but it's otherwise a useful feature for saving and sharing graphs with others.

Unknown to some users, Graphite's dot-delimited naming schema is not only available in metrics, but in saved graph names as well. Once you've created or modified a graph, click the Save button (floppy disk icon)...

Enter a dot-delimited path prefix to create a folder. In the following example we're creating a folder named nginx with a graph named conntrack. For additional nested folders, add more path segments.

Graphite will save the graph metadata to the database and popup a dialog confirming the action.

Reload the Graphite window and you'll see the new directory structure.

Graphite Script for Campfire Hubot

2012-04-13 23:42:57 by jdixon

We use Campfire extensively at $DAYJOB. As our Ops team is 100% remote, it's become indispensable for us. Although it has some minor warts (lack of proper timestamps) it works quite well as a chat medium and collaboration tool. Because of its popularity, there are tons of plugins available. Not the least of which is Hubot, a bot written by GitHub specifically for Campfire.

Hubot supports a wide variety of commands, from useful ones that retrieve Google Maps and Images, to more frivolous examples like PugMe. So it was only a matter of time before I saw the potential for Hubot to be used as a mechanism for sharing Graphite data. After dragging my feet for months, I'm proud to bring you the Graphite script for Hubot.

The usage is simple enough; it only does two things. You can search for saved graphs, returning a list of dot-delimited graph identifiers. It does this by recursively crawling the composer graph directory and returning any matches. Pass one of these names to show and it will display the matching graph in your Campfire room.

Attentive readers might wonder why the bot above is named Merman. This is a private fork of Hubot that's been slightly modified for our own tastes. Never fear, this plugin should work fine with the mainstream Hubot.

I have more interesting ideas for this bot, time permitting. For now, give it a spin and let me know what you think. Pull requests are always welcome.

Unhelpful Graphite Tip #6

2012-04-13 09:57:32 by jdixon

I remember one day when I was trying to narrow down an application causing high load on an outlier within a fleet of servers. Nagios wasn't suitable for the task, as it only told me which hosts were currently spiking, not which ones have been spiking for a certain window of time. And it certainly couldn't identify a particular host based on a performance visualization.

My Graphite wizard hat went on and I went to work, narrowing down the list of suspects using wildcards and visually inspecting each host's load profile. Within 5 minutes I found my suspect and basked in my glory.

Naturally my brilliance was short-lived.

Read the rest of this story...

Unhelpful Graphite Tip #5

2012-04-12 15:32:40 by jdixon

Artur Bergman (@crucially) kindly recommends:

Editor's Note: Seriously though, you really should move your Whisper files over to SSD if you haven't already. The IO gain is tremendous and allows you to spend your time being more creative with process distribution across CPU cores (hint: future article).

Unhelpful Graphite Tip #4

2012-04-12 08:17:43 by jdixon

If you're not already aware, Graphite uses Django as the web framework for its underpinnings. In particular, it relies on Django for all user administration, authentication and authorization facilities. This is convenient for Graphite developers, but can be rather inconvenient for Graphite administrators with little-to-no Django experience.

One of my earliest headaches with automating Graphite installations was trying to workaround the interactive manage.py syncdb step from the installation doc. This is usually something everyone wants to run, since it performs the initial admin user creation.

Read the rest of this story...

Unhelpful Graphite Tip #3

2012-04-11 10:06:13 by jdixon

I love JSON. No really, I fucking love JSON. It might have something to do with its phonetic approximation to my own name. Or it might be my preference for anything that hastens the death of XML. Either way, it's a handy format that's become ubiquitous for data interchange. And fortunately for those of us who prefer our graphs rendered client-side, Graphite supports it as an output format.

Add format=json to any query and Graphite will magically convert your png output to JSON. Better yet, grab the URL with curl and pipe it to your favorite parser and bam, instant grits.

$ curl -s "https://graphite/render/?target=metric.foo&from=-5sec&format=json" | \
  python -mjson.tool

[
    {
        "datapoints": [
            [
                3.4285714285714288, 
                1334117659
            ], 
            [
                3.4285714285714288, 
                1334117660
            ], 
            [
                3.4285714285714288, 
                1334117661
            ], 
            [
                3.4285714285714288, 
                1334117662
            ], 
            [
                3.4285714285714288, 
                1334117663
            ]
        ], 
        "target": "metric.foo"
    }
]

STOP! I can hear you now. You're thinking "What the hell man, that was actually helpful." Not to worry, I've got you covered. Take that output and send it to Zach Holman's spark tool. Now you have some seriously unhelpful trending data. From the command-line. You're welcome.

$ curl -s "https://graphite/render/?target=metric.foo&from=-30min&format=json" | \
  python -mjson.tool | grep ',' | grep -v '\]' | spark



▂▅▇▄█▃▆▆▅▇▂▁▂▁▂▁▁▅▇▄▃▃▃▂▂▂▁▂▂▃▁▂▃▁▁▁▂▃▅▄▂▁▂▂▃▃▃▂▄▆▃

Unhelpful Graphite Tip #2

2012-04-10 18:58:26 by jdixon

I wish I could say I've been using this little gem for years. Alas, I just learned about it last night courtesy of R. Tyler Croy (@agentdero). This has already been a godsend, in less than one full day of use.

Save the following snippet as a bookmark. I have it right next to the Graphite link in my bookmark bar.

javascript:url=prompt("Enter Url");
if (url){content.Composer.loadMyGraph("temp", url);};

The next time someone sends you the link to a graph, load up your Graphite composer and then hit that bookmarklet. Paste in the link and click Ok.

Voila, the link is now visible within your composer window. Enjoy!

Unhelpful Graphite Tip #1

2012-04-10 00:41:02 by jdixon

I'd like to begin sharing more of my knowledge as it pertains to using Graphite in production. Most of these upcoming posts are bound to be of the "check out this cool function" variety, but hopefully you can stitch them together into something useful. Before I proceed, I'd like to thank Chris Davis and the team at Orbitz who started this incredible software project and released it to the open-source community. Without your work I'd be stuck using something... less awesome.

Today's tip comes courtesy of a combined effort by me and Michael Leinartas (@mleinart). I've used this particular combination of functions before to calculate the number of "events" in a series during a particular timeframe. Unfortunately I failed to record this query anywhere (pro-tip: save your best Graphite functions in a document or gist, you'll be glad you did) although I had a vague idea of the functions needed. Michael was kind enough to remind me of the particular order for chaining the functions.

Read the rest of this story...

Trending with Purpose

2011-03-18 13:52:44 by jdixon

I threw together a presentation on short notice this week for an internal tele-conference about Trending with Purpose. The end result was much better than I might have expected (even given my penchant for procrastinating). Although much of the content is specific to applications currently in use at $DAYJOB, I think there's something to take out of it even if you're not using these tools.

The content is intended for developers who might not (or know how to) use application profiling data to complement their operations' monitoring and trending efforts. Special props to the Orbitz.com developers for open-sourcing their Graphite graphing tool, as well as John Allspaw and the Etsy Engineering team for their work on StatsD, and for generally serving as innovators in the Web Operations industry.

Special note: These slides were thrown together in rapid fashion. Anyone who experiences violent reactions to Gill Sans Italic should not download this slideshow. You have been warned.

The slides are available here.