Ah, observability, the new buzzword of the day. Monitoring vendors aplenty are using the word, to basically mean “better monitoring!” You know, #monitoringlove not #monitoringsucks. Because monitoring doesn’t help with debugging and doesn’t have app instrumentation right?
Well, I have to say “bah” to that. So here’s the thing. I’m an electrical engineer by education, and I spent a lot of time working at National Instruments, an engineering test and measurement company. You may be surprised to know these terms have actual definitions that don’t require Twitter arguments to discover.
Monitoring is an activity you perform. It’s simply observing the state of a system over a period of time.
Why do we monitor? For three reasons, in general.
- Problem Detection – you know, alerting, or seeing issues on dashboards.
- Problem Resolution – root cause and troubleshooting.
- Continuous Improvement – capacity planning, financial planning, trending, performance engineering, reporting.
You can instrument metrics or events; metrics have certain sampling frequency and resolution…
So what is observability? This isn’t a new term. It comes from system control theory. You know, the stuff that makes your A/C system and electrical plants and your car work.
Observability is a measure of how well the internal states of a system can be inferred from knowledge of its external outputs.
Observability is a property of a system. You can monitor a system using various instrumentation, but if the system doesn’t externalize its state well enough that you can figure out what’s actually going on in there, then you’re stuck.
So is observability hippy bullcrap? No, of course not. In a DevOps world, it’s very important that the apps and systems concentrate on making themselves both observable and controllable (I leave it to the reader to research controllability, unless I get agitated enough to post about that too). Do you make yourself “easy to monitor”?
Externalizing custom metrics contributes to observability (you know, like with dropwizard metrics). So does good logging. So does proper architecture! Take a system that sticks all kinds of messages into one message queue rather than using separate queues for separate types – the latter is more observable; you can more readily see how many of what is flowing through. (It’s more controllable too, as you can shut off one queue or another.)
Making your system observable is therefore important, so that if you monitor it with appropriate instrumentation, you understand the state of the system and can make short or long term plans to change it.
While a monitoring tool can definitely contribute to this via its innovation in instrumentation, analysis, and visualization, in large part observability is a battle won or lost before you start sticking tools on top of the system. It’s very important to take it into account when designing and implementing services. No tool is going to “give you” observability and that’s the usual silver bullet fallacy heard from someone who wants to sell you something.
I’m not saying every vendor is using the term wrongly (in fact I just came across this New Relic post that is very well done), but I have to say I am less than impressed when common engineering terms are so widely misused and misunderstood widely in our industry.
Would you like to know more? Peco and I are working on a new lynda.com course on monitoring and observability! There’ll be real engineering, a broad canvas of the different kinds of monitoring instrumentation, tips on implementation and use… We’ve both been using and/or building monitoring tools for decades now so we hope to have some useful info for you.
One response to “Monitoring and Observability”
Ernest… do you think its possible to “score” a system on a scale of more observable vs less observable. What are the qualities that makes something observable. What does one need to implement. What tools and tech should one use to observe said system (now and in the future)….and what about the unknown unknowns..