Cloud computing has been the buzz and hype lately and everybody is trying to understand what it is and how to use it. In this post, I wanted to explore some of the properties of “the cloud” as they pertain to Application Performance Management. If you are new to cloud offerings, here are a few good materials to get started, but I will assume the reader of this post is somewhat familiar with cloud technology.
As Spiderman would put it, “With great power comes great responsibility” … Cloud abstracts some of the infrastructure parts of your system and gives you the ability to scale up resource on demand. This doesn’t mean that your applications will magically work better or understand when they need to scale, you still need to worry about measuring and managing the performance of your applications and providing quality service to your customers. In fact, I would argue APM is several times more important to nail down in a cloud environment for several reasons:
- The infrastructure your apps live in is more dynamic and volatile. For example, cloud servers are VMs and resources given to them by the hypervisor are not constant, which will impact your application. Also, VMs may crash/lock up and not leave much trace of what was the cause of issues with your application.
- Your applications can grow bigger and more complex as they have more resources available in the cloud. This is great but it will expose bottlenecks that you didn’t anticipate. If you can’t narrow down the root cause, you can’t fix it. For this, you need scalable APM instrumentation and precise analytics.
- On-demand scaling is a feature that is touted by all cloud providers but it all hinges on one really hard problem that they won’t solve for you: Understanding your application performance and workload behave so that you can properly instruct the cloud scaling APIs what to do. If my performance drops do I need more machines? How many? APM instrumentation and analytics will play a key part of how you can solve the on-demand scaling problem.
So where is APM for the cloud, you ask? It is in its very humble beginnings as APM solution providers have to solve the same gnarly problems application developers and operations teams are struggling with:
- Cloud is dynamic and there are no fixed resources. IP addresses, hostnames, and MAC addresses change, among other things. Properly addressing, naming and connecting the machines is a challenge – both for deep instrumentation agent technology and for the apps themselves.
- Most vendors’ licensing agreements are based on fixed count of resources billing rather than a utility model, and are therefore difficult to use in a cloud environment. Charging a markup on the utility bill seems to be a common approach among the cloud-literate but introduces a kink in the regular budgeting process and no one wants to write a variable but sizable blank check.
- Well the cloud scales… a lot. The instrumentation / monitoring technology has to be low enough overhead and be able to support hundreds of machines.
On the synthetic monitoring end there are plenty of options. We selected AlertSite, a distributed synthetic monitoring SaaS provider similar to Keynote and Gomez, for simplicity but in only provides high level performance and availability numbers for SLA management and “keep the lights are on” alerting. We also rely on CloudKick, another SaaS provider, for the system monitoring part. Here is a more detailed use case on the Cloudkick implementation.
For deeper instrumentation we have worked with OPNET to deploy their AppInternals Xpert (former Opnet Panorama) solution in the Amazon EC2 cloud environment. We have already successfully deployed AppInternals Xpert on our own server infrastructure to provide deep instrumentation and analysis of our applications. The cloud version looks very promising for tackling the technical challenges introduced by the cloud, and once fully deployed, we will have a lot of capability to harness the cloud scale and performance. More on this as it unfolds…
In summary, as vendors sell you the cloud but be prepared to tackle the traditional infrastructure concerns and stick to your guns. No, the cloud is not fixing your performance problems and is not going to magically scale for you. You will need some form of APM to help you there. Be on the lookout for providers and do challenge your existing partners to think of how they can help you. The APM space in the cloud is where traditional infrastructure was in 2005 but things are getting better.
To the cloud!!! (And do keep your performance engineers on staff, they will still be saving your bacon.)