by Ernest Mueller | January 19, 2012 · 3:33 pm

Why Does Cloud Load Balancing Suck?

Back in the old world of real infrastructure, we used Netscalers or F5’s and we were happy. Now in the cloud, you have several options all of which seem to have problems.

1. Open source. But once you want SSL, and redundancy, and HTTP compression, you get people saying with a straight face “nginx (for HTTP compression) –> Varnish cache (for caching) –> HTTP level load balancer (HAProxy, or nginx, or the Varnish built-in) –> webservers.” (Quoted from Server Fault). Like four levels, often with the same software twice in it. And don’t forget some kind of heartbeat between the two front-ends. Oh look I’ve spent $150/mo on just machines to run my load balancing. And I really want to load balance/failover between all my tiers not just the front end. It’s a lot of software parts to go wrong.

2. Zeus. For some reason none of the other LB vendors have gotten off their happy asses and delivered a good software load balancer you can use in Amazon. I got tired of talking to our Netscaler reps about it after the first couple years. They’re more interested in selling their hardware to the cloud data centers than helping real people load balance their apps. Zeus is the only one – and it’s really quite expensive

3. Amazon ELBs. These just have a lot of problems under the hood. We’ve been engaged with Amazon ELB product management on them – large files serve out super slow; users get hits refused due to throttling/changes during ELB scaling – basically if you want 100% of your hits to come through you can’t use them.

4. Geo-IP load balancing, through Dyn or whoever. They claim to have the failover problem fixed, but it still only works for the front end tier of what is a multitier architecture. I certainly don’t want to have to advertise every internal IP in external DNS to make load balancing work.

And really the frustrating part is there seems to have been no headway on any of this stuff in a decade. Same old open source options, same old techniques. Can someone come up with a way to load balance on the cloud that a) doesn’t lose any hits, b) is one thing not 4 things, and c) is useful for front and back end balancing? Seems like a necessary part of oh say every system ever, why is it still so hard?

31 Comments

Filed under Cloud, DevOps

Tagged as complaining, load balancing

31 responses to “Why Does Cloud Load Balancing Suck?”

Vincent Bernat

January 22, 2012 at 1:21 am

nginx now allows you to have compression, load-balancing, SSL and now cache. Of course, cache functionality is pretty basic compared to varnish and load-balancing functionality is pretty light compared to haproxy.

Reply
- Steve S.
  
  September 28, 2012 at 7:21 pm
  
  nginx is awesome… as a web server. That’s what it was designed for. As a proxy, it lacks a lot of the key things that the author mentions in terms of proper HA, rich load balancing, etc. To your point, HAproxy is a better choice if you want an open source load balancer.
  
  Reply
- Phil Hendren
  
  February 12, 2015 at 12:25 pm
  
  You hear lots of great things about Nginx and I have had success with it in the past – however there are some pretty big issues with it involving HTTP 1.1 compliance and other quality concerns. I learned about these issues during a presentation given by Bryan Call who is a well respected engineer from Yahoo. Here is the slideshare if you want specifics: http://www.slideshare.net/bryan_call/choosing-a-proxy-server-apachecon-2014
  
  Reply
Peco

February 3, 2012 at 9:56 am

If only someone would resurrect Resonate! It was a pioneer in the load balancing industry and unfortunately got eaten up by the .com crash. It was software based which meant you can install anywhere and spread the load of your traffic. Also it would allow non-http load balancing for the odd ball apps out there. Maybe need to look at the latest nginx version as its proper successor.

Reply
- Steve S.
  
  September 28, 2012 at 7:19 pm
  
  Resonate used a networking hack that would not fly on AWS.
  
  Reply
codycnzody Cooper

February 7, 2012 at 2:03 am

I use Amazon ELB and run the monitoring tool from Pingdom on a 1-minute check. I have yet to have any down time as a result of ELB.

Reply
- ernestm
  
  April 23, 2012 at 6:36 pm
  
  That’s way too low of an interval to find something like the problems I saw. It’s not continuous downtime, it’s 1 in 1000 hits being dropped. Very hard to find.
  
  Reply
Clara

April 10, 2012 at 5:11 pm

Has anyone everhad duplicate info transmission due to a load balancing issue?

Reply
- ernestm
  
  April 23, 2012 at 6:42 pm
  
  We had that, not load balancing itself per se but from our network redundancy around our front end, ended up repeating packets on us. I forget what we did about it, mainly yelled at our network admins…
  
  Reply
José García Robles

May 15, 2012 at 11:48 am

Have you tested Zen Load Balancer? It is still in development, but it can be a very good open source load balancer solution. It is moving with the suggestions of their users, and the develpment team is very active. If you don’t find the features you need, you can suggest it to the mailing list.

Reply
- John M.
  
  July 7, 2014 at 6:16 pm
  
  ZEN load balancer is the pretty bad. Configuration is really unstable. I.e.: if you delete a farm, you wreck the entire server.
  
  Reply
Steve S.

September 28, 2012 at 7:26 pm

NetScaler announced a “tech preview” (which means beta in their world) for AWS here: https://www.citrix.com/English/ne/news/news.asp?newsID=2324093. NetScaler is already available on a few other clouds, most notably you can get it as a self-serve at SoftLayer.

Reply
- ernestm
  
  October 14, 2012 at 9:34 pm
  
  Huh. Is it really AWS native? I’ve gone through the whole snipe hunt of rooting through their marketing talk every time they claim “we have netscaler for AWS” and it always turns out to be something lame and on premise. Well, I hope it livens up. At NI we used Netscalers, at BV we’re using F5 at Rackspace and HAProxy in AWS regions with UltraDNS GTM in front of it, works decent but automating the changes to HAproxy config is a PITA. I’m open to new solutions that actually work… I get frustrated at the FUD around cloud load balancing, I had to finally tell A10 to just stop calling me. “But do you have something for pure AWS systems?” “Sure! Let’s get you on a call with an engineer.” “Hello engineer, so will this work on a pure AWS system?” “Well no…” Now sadly my attitude towards vendors is “I’m going to wait till someone reputable I know is actually using it before bothering to listen much.”
  
  Reply
Sergi de Pablos

September 30, 2012 at 6:51 am

And don’t forget the ELB slowness – up to 1.5s – on the SSL negotiation when using big SSL keys.

Reply
webfabric (@webfabric)

October 4, 2012 at 10:06 am

With Direct Connect you can use your BigIP LBs from your datacenter to your cloud instances. http://aws.amazon.com/directconnect/

Reply
Malcolm Turnbull

October 10, 2012 at 11:42 am

I’m really surprised with you having issues with Amazon ELB, have you logged a support case? Netflix is open sourcing some of the tools that they use, but the main thing is that Amazon has no interest in SSL or compression because they would rather you do simple ELB and handle CPU intensive stuff in the cluster. We often get asked at Loadbalancer.org about wizzy layer 7 stuff in our Amazon EC2 product which is based on HAProxy. Our answer tends to be “Don’t even think about it…” our product is great for things Amazon ELB doesn’t do like maintenance mode , extended health checks, rdp cookies etc… But we would not recommend it for really high loads unless you clustered it behind ELB… Much like Zeus aka. Riverbed recommend….

Reply
- ernestm
  
  October 14, 2012 at 9:26 pm
  
  Yes, not only support but as I said in the article we worked with the actual product manager for a while. We currently use GTM + HAProxy on the front end at BV. (Well, and Akamai, and Apache, and nginx.)
  
  Reply
Dhawal Parkar

October 11, 2012 at 12:25 pm

In my quest for a Internal Secured ELB (multi-tiered) architecture, I now have to use AWS VPC.

Reply
- ernestm
  
  October 14, 2012 at 9:23 pm
  
  Us too. Victor Trac from Bazaarvoice did a presentation at last month’s Austin Cloud User Group about our newest implementation at BV, basically a whole “virtual data center” in Amazon for our dev teams to use. VPCs, a CloudFormation template to bring up DNS, NAT, ENI attachers, VPN tunnels… I’ll ping him on posting about it/us open sourcing the github project.
  
  Reply
Tyk

January 18, 2013 at 7:59 pm

Why not spend on some Cisco ASA to LB? Its not free but for me it works very good see http://www.asavirtual.org there is some articles and info i use Cisco, juniper and at main DC all is Extreme networks equipment exept 4x Cisco for LB and firewall we have some older HP servers running esx for honey/IDS that “repport” to firewall.
In lab we play with pf (pfsence) CARP for HA the idé is to setup multi openvpn or ipsec tunnels true diffrent locations (we have HP blade full and half racks in .se/.de/.uk, USA/CA and .ru/.ua) any suggestions other then pfsence? Ofcourse free open source would be good. Dont get me wrong i love Extreme, cisco etc but i belive its time to support good open source projects (donate) insted of feeding “extreme” big companys like cisco (long time cosumers 10+years should get some discount…NOT!) Oh and i almost for got thanks for article and also comments good reading!

Reply
Abhishek Choudhary

September 25, 2013 at 2:02 pm

I won;t entirely reject all the points mentioned but I would say I didn’t face these problem with Jelastic cloud provider,. I mean the load balancing was literally very easy and stable solution provided in Jelastic.
NGINX is the reverse proxy server used for HTTP loadbalancing
http://jelastic.com/docs/http-load-balancing
and that is quite efficient.
TCP Load balancing uses Round Robing based algorithm for request distributions
http://jelastic.com/docs/tcp-load-balancing
. So I feel its not entirely right to say that cloud doesn;t support proper load balancing 🙂

Reply
Hendren

April 11, 2014 at 2:21 pm

I just watched an informative presentation at Apachecon 2014 in Denver by Bryan Call – Sr. Principle Engineer for Yahoo on Apache Traffic Server. He had a great deal of testing/benchmark data included, the broad strokes on NGINX was that it was one of the faster solutions, however it also had the least compliance with HTTP1.1 and the most errors. While it would not be a problem for smaller amounts of traffic, as the traffic levels increased during benchmark tests, so did the amount of problems (certain use cases would be fine but not for yahoo).

Reply
Ron

June 24, 2014 at 6:24 pm

ATS has a loadbalancer plugin that allows you to load balance and have a reverse proxy, cache, SSL acceleration, and with keepalived, you can have redundancy as well. It gives you a double benefit of being a WAF if you implement modsecurity on a NGINX or stripped apache instance. There is even someone building a wrapper for ATS to include modsecurity natively.

It’s extremely high performance, and even on a very inexpensive set of VMs, it can process a ton of traffic. There are several case studies that show a 2x to 2.5x performance over HAProxy and even more than Varnish, and you get the benefit of ESI support built in with ATS, so Magento and Prestashop hosts have a really excellent solution if they implement an FPC with ESI support.

Reply
Thrawn

October 16, 2014 at 9:52 pm

Modern HAProxy can handle SSL and HTTP compression, so it’s just 3 levels: HAProxy, cache, webservers. You’re not going to get less than 2 levels (load balancer and webservers) in any case, so this shouldn’t be a big deal. And HAProxy has a tiny footprint, so I can’t see that you’d need to spend big bucks on the hardware for it.

Reply
Siva

January 13, 2015 at 7:47 am

Heard about Appcito CAFE.

Appcito, a venture backed company, has developed a multi-cloud application delivery infrastructure for applications deployed over a public (e.g. AWS) or private enterprise cloud like.

The Appcito Cloud Application Front-End Service (CAFE) delivers unified application services like Load balancing, application security, application front-end optimization along with application analytics and insights. In addition, it provides traffic steering and analytics in DevOps continuous deployment workflows. Legacy approaches using Application Delivery Controller (ADC) or using open source to try and build these capabilities in the applications (DIY) are costly, not cloud native and with business models not conducive to cloud. Appctito services are cloud-native, highly available and elastic and do not require customers to manage software/appliances.

Appcito CAFE provides these main advantages or benefits:
· Unified services: Appcito tightly integrates implementation of application availability (load balancing), security, performance and continuous deployment
· Analytics and Insights: Provides analytics and unique insights into application traffic
· Simple to activate: Appcito service activation takes less than five minutes.
· Cloud Native: Appcito’s service is entirely cloud-native: built using cloud tools for optimizing applications that will be deployed in the cloud.
The offering is available as a service with a consumption based business model.

Founding team is from Citrix Netscaler and F5.

CAFE load balances traffic across web servers, so websites and applications can cost-effectively scale operations. Our load balancers provide superior levels of functionality, enhanced flexibility and detailed visibility. CAFE also supports automatic failover between zones or data centers to accelerate disaster recovery in the event of an outage. Protect each of your applications at different levels of security through cross zone/DC load balancing.

Reply
Ryan Fligg

February 12, 2015 at 2:52 am

We have found EdgeDirector to be very useful. It is a DNS service that charges per query and allows you to geo-balance your DNS entries with alert checking of your services via various methods. If you don’t want to do the manual configuration of Apache with regions in response to the initiating IP, this is a great alternative.

Reply
Sarah G.

May 5, 2015 at 1:03 pm

Check out Avi Networks (www.avinetworks.com) which launched end of last year just to solve this problem. Would welcome any thoughts

Reply
- Brian Relch
  
  March 13, 2016 at 5:48 pm
  
  AVI seems to be just nginx and OpenSSL at its core. It’s got some cool automation and autoscaling that can save you time if you can figure it out yourself. for the pretty lipstick portion we are quite happy with elkstack.
  
  Reply
Veracruz

June 4, 2015 at 3:51 pm

F5 does have a lightweight node.js-based proxy to fit in the spaces where full BigIP is too big or expensive… and it has support. I’ve only played with a demo but it does seem pretty flexible.

https://linerate.f5.com/learn

Reply
Ratna Malladi

November 16, 2015 at 5:22 pm

You might want to consider Appcito’s elastic load balancer. It has all the enterprise grade features (redundancy, ssl offload, traffic steering) you are looking for and at the same time, its built ground up for Cloud. Its lightweight and relatively inexpensive. You can find more info here
http://www.appcito.com/products/elastic-load-balancing/

Reply
- Jason Bridges
  
  February 28, 2016 at 11:00 pm
  
  Have you heard about appcito?? Oh my god it’s spam spam spam! Geez! Go away you freaking salesmen who do not know jack-squat about anything.
  
  Most like your product head-to-head will never touch haproxy – so stop with it.
  
  I have to agree on AWS – we ended up with nginx+haproxy and it seems to work well. Our provider provides this free of charge and simply charges us for traffic. Hate to even mention the name now that all this spam is here. FiberPeer.com Is who we use – in either case, we take near 150-300k hits per minute during busy times. Especially after we release an update to our software. Never once have had a problem to be honest. We have been there about 2-years and twice they have had maintenance; once planned and once unplanned. Both times they moved the load balancer via DNS (5 minute TTL’s – no customer complaints in either case.
  
  Anyway, I believe they primarily use nginx+haproxy – I am not completely certain of that. I do know they were making several changes on the versions they run but still.
  
  Best,
  Jason
  
  Reply