Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at the New York ColdFusion User Group (Nov. 2009) with: Pete Freitag
Ben Nadel at the New York ColdFusion User Group (Nov. 2009) with: Pete Freitag@pfreitag )

A Graph Should Answer One Question And Should Answer It Well

By Ben Nadel on
Tags: Work

For years, Uncle Bob Martin has been telling the software development community that Functions should do one thing; and, should do it well. To me, this advice is about focus and cohesion. I really like this mentality and I believe that it applies to much more than just programming. But, as of late, I've been thinking about this advice in terms of logging metrics. Or, more specifically, in terms of graphing metrics. The more I noodle on this, the more I believe that a graph should answer one question; and, should answer it well.

To be honest, it's not always easy to understand what question is being answered by a graph. Generally speaking, graphs are about system health. So, it's easy to fall into the trap of thinking that a graph is answering the question, "Is the system healthy?". But, this question is too broad and provides far too much wiggle room. After all, I could include every recorded metric in the same graph and claim that it offers an at-a-glance view of the system health.

Clearly, that would be insane and unusable. But, it demonstrates the slippery slope of broad questions. To help narrow in on the right question, perhaps we can leverage one of the red flags that we use in software development: the word "And". If a graph is answer this question AND that question, the graph is doing too much.

For example, imagine this graph:

What is the 95th-percentile response time for these end-points AND how does that compare to the previous month?

Such a graph might look like this:


 A graph should answer one question, and answer it well. 

It's a relatively simple graph, but the use of "AND" in the question indicates that this graph is likely trying to do too much. I think it would be better for the graph to answer either one of those questions, but not both. For example, we could create a graph that answers only the one question:

How does the 95th-percentile response time for these end-points compare to the prior month?

Such a graph might look like this:


 A graph should answer one question. 

Obviously, this is the same exact graph; but, by changing the scope of the question - focusing purely on the month-over-month delta - we can change the format of the graph and not lose any information. For example, the same graph could be rendered like this:


 When a graph answers only one question, it can be formatted to answer that question most effectively. 

In this version, we can't see what the 95th-percentile is (in terms of absolute values); but, we can clearly see that it is increasing over time. And, that there is clearly a tipping point in that behavior. In fact, I would argue that this latter graph actually does a better job of answering the question since it removes the noise of the two graphs and consolidates them both down into one meaningful, easy-to-understand metric.

The logging and monitoring of a system's behavior is ultimately about taking action. Do I need to do something immediately in order to alleviate a problem? Do we need to roll-back a deployment that caused a sudden performance decrease? How much longer can this rate of increase be sustained before we need to refactor the application? Can this feature be end-of-lifed due to lack of interest? These are specific questions and they should be answered by graphs that do one thing; and, do it well.

Looking For A New Job?

100% of job board revenue is donated to Kiva. Loans that change livesFind out more »

Reader Comments

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.