Why Amazon Reserve Instances Torment Me

We’ve been using over 100 Amazon EC2 instances for a year now, but I’ve just now made my first reserve instance purchase. For the untutored, reserve instances are where you pay a yearly upfront per instance and you get a much, much lower hourly cost. On its face, it’s a good deal – take a normal Large instance you’d use for a database.  For a Linux one, it’s $0.34 per hour.  Or you can pay $910 up front for the year, and then it’s only $0.12 per hour. So theoretically, it takes your yearly cost from $2978.40 to $1961.2.  A great deal right?

Well, not so much. The devil is in the details.

First of all, you have to make sure and be running all those instances all the time.  If you buy a reserve instance and then don’t use it some of the time, you immediately start cutting into your savings.  The crossover is at 172 days – if you don’t run the instance at least 172 days out of the year then you are going upside down on the deal.

But what’s the big deal, you ask?  Sure, in the cloud you are probably (and should be!) scaling up and down all the time, but as long as you reserve up to your low water mark it should work out, right?

So the big second problem is that when you reserve instances, you have to specify everything about that instance.  You aren’t reserving “10 instances”, or even “10 large instances” – you have to specify:

  • Platform (UNIX/Linux, UNIX/Linux VPC, SUSE Linux, Windows, Windows VPC, or Windows with SQL Server)
  • Instance Type (m1.small, etc.)
  • AZ (e.g. us-east-1b)

And tenancy and term. So you have to reserve “a small multitenant Linux instance in us-east-1b for one year.” But having to specify down to this level is really problematic in any kind of dynamic environment.

Let’s say you buy 10 m1.large instances for your databases, and then you realize later you really need to move up to an m1.xlarge.  Well, tough. You can, but if you don’t have 10 other things to run on those larges, you lose money. Or if you decide to change OS.  One of our biggest expenditures is our compile farm workers, and on those we hope to move from Windows to Linux once we get the software issues worked out, and we’re experimenting with best cost/performance on different instance sizes. I’m effectively blocked from buying reserve for those, since if I do it’ll put a stop to our ability to innovate.

And more subtly, let’s say you’re doing dynamic scaling and splitting across AZs like they always say you should do for availability purposes.  Well, if I’m running 20 instances, and scaling them across 1b and 1c, I am not guaranteed I’m always running 10 in 1b and 10 in 1c, it’s more random than that.  Instead of buying 20 reserve, you instead have to buy say 7 in 1b and 7 in 1c, to make sure you don’t end up losing money.

Heck, they even differentiate between Linux and Suse and Linux VPC instances, which clearly crosses over into annoyingly picky territory.

As a result of all this, it is pretty undesirable to buy reserve instances unless you have a very stable environment, both technically and scale-wise. That sentence doesn’t describe the typical cloud use case in my opinion.

I understand, obviously, why they are doing this.  From a capacity planning standpoint, it’s best for them if they make you specify everything. But what I don’t think they understand is that this cuts into people willing to buy reserve, and reserve is not only upfront money but also a lockin period, which should be grotesquely attractive to a business. I put off buying reserve for a year because of this, and even now that I’ve done it I’m not buying near as many reserve as I could be because I have to hedge my bets against ANY changes to my service. It seems to me that this also degrades the alleged point of reserves, which is capacity planning – if you’re so picky about it that no one buys reserve and 95% of your instances are on demand, then you can’t plan real well can you?

What Amazon needs to do is meet customers halfway.  It’s all a probabilities game anyway. They lose specificity of each given reserve request, but get many more reserve requests (and all the benefits they convey – money, lockin, capacity planning info) in return.

Let’s look at each axis of inflexibility and analyze it.

  • Size.  Sure, they have to allocate machines, right?  But I assume they understand they are using this thing called “virtualization.”  If I want to trade in 20 reserved small instances for 5 large instances (each large is 4x a small), why not?  It loses them nothing to allow this. They just have to make the effort to allow it to happen in their console/APIS. I can understand needing to reserve a certain number of “units” but those should be flexible on exact instance types at a given time.
  • OS. Why on God’s green earth do I need to specify OS?  Again, virtualized right? Is it so they can buy enough Windows licenses from Microsoft?  Cry me a river.  This one needs to leave immediately and never come back.
  • AZ. This is annoying from the user POV but probably the most necessary from the Amazon POV because they have to put enough hardware in each data center, right?  I do think they should try to make this a per region and not a per AZ limit, so I’m just reserving “us-east” in Virginia and not the specific AZ, that would accommodate all my use cases.

In the end, one of the reasons people move to the cloud in the first place is to get rid of the constraints of hardware.  When Amazon just puts those constraints back in place, it becomes undesirable. Frankly even now, I tried to just pay Amazon up front rather than actually buy reserve, but they’re not really enterprise friendly yet from a finance point of view so I couldn’t make that happen, so in the end I reluctantly bought reserve.  The analytics around it are lacking too – I can’t look in the Amazon console and see “Yes, you’re using all 9 of your large/linux/us-east-1b instances.”

Amazon keeps innovating greatly in the technology space but in terms of the customer interaction space, they need a lot of help – they’re not the only game in town, and people with less technically sophisticated but more aggressive customer services/support options will erode their market share. I can’t help that Rackspace’s play with backing OpenStack seems to be the purest example of this – “Anyone can run the same cloud we do, but we are selling our ‘fanatical support'” is the message.

8 Comments

Filed under Cloud

8 responses to “Why Amazon Reserve Instances Torment Me

  1. Jon

    I agree and I understand that the reason they do this is to lock in the actual
    system that is guaranteed to be there for you but the least they could do would
    be to allow a “best effort” swap. So you’re allowed to swap a reserved instance
    for another reserved instance or set of reserved instances if the other instances
    happen to be available.

  2. mmwasser

    I completely agree with the points you have here. There’s also a number of interesting consumer side pains associated with the AWS reserved instance model — e.g. frequently adding instances to the AWS bill doesn’t require approvals as a management team has already agreed to pay for AWS. However, since RIs are single time purchases they often go through more complex approval processes in small and large companies alike. Not sure there’s a resolution there — just another reason why they don’t get used.

    I feel like AWS has gotten themselves into a difficult place with the RI concept as well. With every change they make — the introduction of new instance types, offering levels (light, medium, heavy), and pricing decreases — they make more people regret the choices they’ve already made/ less interested in considering RIs in the future. Last I checked there were 2780 different possible RI configurations and 460 different non-interchangable configurations in US-East-1 alone. Interestingly, while I’ve been running a Reserved Instance recommendation/ usage analysis tool (www.raveld.com) I’ve seen that 89% of AWS users with more than 5 instances don’t use RIs at all!

    I wonder if there could have been a different, more usable, concept that still would have made RIs financially possible. Taking your post a little further: maybe you could buy “RI Units” that translate directly to how many EC2 compute units available instead of specific instance configurations. Eg, a m1.small would be 2 units, a m1.large would be 8. Each RI unit would give a direct percentage discount off the hourly rate. You’d still buy units for specific AZs, but within the AZ, you could apply it to any configuration. Maybe windows instances would be 1.5x the RI Units or something like that to cover the windows license. I imagine different VM instance types are created using different physical hardware configurations — but maybe another selection criteria would be m1, c1, m2 RI units etc. I feel that the fewer the options, the more likely people would be to purchase RIs — even if the concept is slightly more abstract.

  3. I completely agree with the points you have here. There’s also a number of interesting consumer side pains associated with the AWS reserved instance model — e.g. frequently adding instances to the AWS bill doesn’t require approvals as a management team has already agreed to pay for AWS. However, since RIs are single time purchases they often go through more complex approval processes in small and large companies alike. Not sure there’s a resolution there — just another reason why they don’t get used.

    I feel like AWS has gotten themselves into a difficult place with the RI concept as well. With every change they make — the introduction of new instance types, offering levels (light, medium, heavy), and pricing decreases — they make more people regret the choices they’ve already made/ less interested in considering RIs in the future. Last I checked there were 2780 different possible RI configurations and 460 different non-interchangable configurations in US-East-1 alone. Interestingly, while I’ve been running a Reserved Instance recommendation/ usage analysis tool (www.raveld.com) I’ve seen that 89% of AWS users with more than 5 instances don’t use RIs at all!

    I wonder if there could have been a different, more usable, concept that still would have made RIs financially possible. Taking your post a little further: maybe you could buy “RI Units” that translate directly to how many EC2 compute units available instead of specific instance configurations. Eg, a m1.small would be 2 units, a m1.large would be 8. Each RI unit would give a direct percentage discount off the hourly rate. You’d still buy units for specific AZs, but within the AZ, you could apply it to any configuration. Maybe windows instances would be 1.5x the RI Units or something like that to cover the windows license. I imagine different VM instance types are created using different physical hardware configurations — but maybe another selection criteria would be m1, c1, m2 RI units etc. I feel that the fewer the options, the more likely people would be to purchase RIs — even if the concept is slightly more abstract.

  4. Is there an argument against using reserved instances for “base capacity” and autoscaling with on-demand? That way, you are able to get a fix on your fixed costs. Thanks.

    • Well sure, that’s the default use. Doesn’t help with the problems if you’re using multiple AZs, OSes, sizes… If you are running “750-100 of exactly the same thing” then they’re great.

  5. I just got done putting together a huge RI purchase at my new gig. OMG, this process is still hateful. The larger you are, the more quickly you are innovating, and the more teams you have using instances and injecting uncertainty into your plans, the worse this is. My eyes are swimming from looking at excel spreadsheets trying to pin down what we’re using – which is most instance types in all AZs in three Amazon regions – and figure out what we’re already using, what we’re underwater on, what we think we can safely reserve… The new “RI Marketplace” has too many limitations to actually use and it’s a “solution” to a problem of their own making. Bah! OK, vent over, though I may actually post on this again soon.

  6. Pingback: Google Cloud Update | the agile admin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s