Why Does Cloud Load Balancing Suck?

Back in the old world of real infrastructure, we used Netscalers or F5’s and we were happy.  Now in the cloud, you have several options all of which seem to have problems.

1. Open source.  But once you want SSL, and redundancy, and HTTP compression, you get people saying with a straight face “nginx (for HTTP compression) –> Varnish cache (for caching) –> HTTP level load balancer (HAProxy, or nginx, or the Varnish built-in) –> webservers.” (Quoted from Server Fault).  Like four levels, often with the same software twice in it. And don’t forget some kind of heartbeat between the two front-ends. Oh look I’ve spent $150/mo on just machines to run my load balancing. And I really want to load balance/failover between all my tiers not just the front end.  It’s a lot of software parts to go wrong.

2. Zeus.  For some reason none of the other LB vendors have gotten off their happy asses and delivered a good software load balancer you can use in Amazon.  I got tired of talking to our Netscaler reps about it after the first couple years.  They’re more interested in selling their hardware to the cloud data centers than helping real people load balance their apps. Zeus is the only one – and it’s really quite expensive

3. Amazon ELBs.  These just have a lot of problems under the hood.  We’ve been engaged with Amazon ELB product management on them – large files serve out super slow; users get hits refused due to throttling/changes during ELB scaling – basically if you want 100% of your hits to come through you can’t use them.

4. Geo-IP load balancing, through Dyn or whoever. They claim to have the failover problem fixed, but it still only works for the front end tier of what is a multitier architecture. I certainly don’t want to have to advertise every internal IP in external DNS to make load balancing work.

And really the frustrating part is there seems to have been no headway on any of this stuff in a decade. Same old open source options, same old techniques.  Can someone come up with a way to load balance on the cloud that a) doesn’t lose any hits, b) is one thing not 4 things, and c) is useful for front and back end balancing?  Seems like a necessary part of oh say every system ever, why is it still so hard?


Filed under Cloud, DevOps

31 responses to “Why Does Cloud Load Balancing Suck?

  1. nginx now allows you to have compression, load-balancing, SSL and now cache. Of course, cache functionality is pretty basic compared to varnish and load-balancing functionality is pretty light compared to haproxy.

    • Steve S.

      nginx is awesome… as a web server. That’s what it was designed for. As a proxy, it lacks a lot of the key things that the author mentions in terms of proper HA, rich load balancing, etc. To your point, HAproxy is a better choice if you want an open source load balancer.

    • You hear lots of great things about Nginx and I have had success with it in the past – however there are some pretty big issues with it involving HTTP 1.1 compliance and other quality concerns. I learned about these issues during a presentation given by Bryan Call who is a well respected engineer from Yahoo. Here is the slideshare if you want specifics: http://www.slideshare.net/bryan_call/choosing-a-proxy-server-apachecon-2014

  2. Peco

    If only someone would resurrect Resonate! It was a pioneer in the load balancing industry and unfortunately got eaten up by the .com crash. It was software based which meant you can install anywhere and spread the load of your traffic. Also it would allow non-http load balancing for the odd ball apps out there. Maybe need to look at the latest nginx version as its proper successor.

  3. I use Amazon ELB and run the monitoring tool from Pingdom on a 1-minute check. I have yet to have any down time as a result of ELB.

    • That’s way too low of an interval to find something like the problems I saw. It’s not continuous downtime, it’s 1 in 1000 hits being dropped. Very hard to find.

  4. Clara

    Has anyone everhad duplicate info transmission due to a load balancing issue?

    • We had that, not load balancing itself per se but from our network redundancy around our front end, ended up repeating packets on us. I forget what we did about it, mainly yelled at our network admins…

  5. Have you tested Zen Load Balancer? It is still in development, but it can be a very good open source load balancer solution. It is moving with the suggestions of their users, and the develpment team is very active. If you don’t find the features you need, you can suggest it to the mailing list.

    • John M.

      ZEN load balancer is the pretty bad. Configuration is really unstable. I.e.: if you delete a farm, you wreck the entire server.

  6. Steve S.

    NetScaler announced a “tech preview” (which means beta in their world) for AWS here: https://www.citrix.com/English/ne/news/news.asp?newsID=2324093. NetScaler is already available on a few other clouds, most notably you can get it as a self-serve at SoftLayer.

    • Huh. Is it really AWS native? I’ve gone through the whole snipe hunt of rooting through their marketing talk every time they claim “we have netscaler for AWS” and it always turns out to be something lame and on premise. Well, I hope it livens up. At NI we used Netscalers, at BV we’re using F5 at Rackspace and HAProxy in AWS regions with UltraDNS GTM in front of it, works decent but automating the changes to HAproxy config is a PITA. I’m open to new solutions that actually work… I get frustrated at the FUD around cloud load balancing, I had to finally tell A10 to just stop calling me. “But do you have something for pure AWS systems?” “Sure! Let’s get you on a call with an engineer.” “Hello engineer, so will this work on a pure AWS system?” “Well no…” Now sadly my attitude towards vendors is “I’m going to wait till someone reputable I know is actually using it before bothering to listen much.”

  7. And don’t forget the ELB slowness – up to 1.5s – on the SSL negotiation when using big SSL keys.

  8. With Direct Connect you can use your BigIP LBs from your datacenter to your cloud instances. http://aws.amazon.com/directconnect/

  9. I’m really surprised with you having issues with Amazon ELB, have you logged a support case? Netflix is open sourcing some of the tools that they use, but the main thing is that Amazon has no interest in SSL or compression because they would rather you do simple ELB and handle CPU intensive stuff in the cluster. We often get asked at Loadbalancer.org about wizzy layer 7 stuff in our Amazon EC2 product which is based on HAProxy. Our answer tends to be “Don’t even think about it…” our product is great for things Amazon ELB doesn’t do like maintenance mode , extended health checks, rdp cookies etc… But we would not recommend it for really high loads unless you clustered it behind ELB… Much like Zeus aka. Riverbed recommend….

    • Yes, not only support but as I said in the article we worked with the actual product manager for a while. We currently use GTM + HAProxy on the front end at BV. (Well, and Akamai, and Apache, and nginx.)

  10. In my quest for a Internal Secured ELB (multi-tiered) architecture, I now have to use AWS VPC.

    • Us too. Victor Trac from Bazaarvoice did a presentation at last month’s Austin Cloud User Group about our newest implementation at BV, basically a whole “virtual data center” in Amazon for our dev teams to use. VPCs, a CloudFormation template to bring up DNS, NAT, ENI attachers, VPN tunnels… I’ll ping him on posting about it/us open sourcing the github project.

  11. Tyk

    Why not spend on some Cisco ASA to LB? Its not free but for me it works very good see http://www.asavirtual.org there is some articles and info i use Cisco, juniper and at main DC all is Extreme networks equipment exept 4x Cisco for LB and firewall we have some older HP servers running esx for honey/IDS that “repport” to firewall.
    In lab we play with pf (pfsence) CARP for HA the idé is to setup multi openvpn or ipsec tunnels true diffrent locations (we have HP blade full and half racks in .se/.de/.uk, USA/CA and .ru/.ua) any suggestions other then pfsence? Ofcourse free open source would be good. Dont get me wrong i love Extreme, cisco etc but i belive its time to support good open source projects (donate) insted of feeding “extreme” big companys like cisco (long time cosumers 10+years should get some discount…NOT!) Oh and i almost for got thanks for article and also comments good reading!

  12. I won;t entirely reject all the points mentioned but I would say I didn’t face these problem with Jelastic cloud provider,. I mean the load balancing was literally very easy and stable solution provided in Jelastic.
    NGINX is the reverse proxy server used for HTTP loadbalancing
    and that is quite efficient.
    TCP Load balancing uses Round Robing based algorithm for request distributions
    . So I feel its not entirely right to say that cloud doesn;t support proper load balancing 🙂

  13. Hendren

    I just watched an informative presentation at Apachecon 2014 in Denver by Bryan Call – Sr. Principle Engineer for Yahoo on Apache Traffic Server. He had a great deal of testing/benchmark data included, the broad strokes on NGINX was that it was one of the faster solutions, however it also had the least compliance with HTTP1.1 and the most errors. While it would not be a problem for smaller amounts of traffic, as the traffic levels increased during benchmark tests, so did the amount of problems (certain use cases would be fine but not for yahoo).

  14. Ron

    ATS has a loadbalancer plugin that allows you to load balance and have a reverse proxy, cache, SSL acceleration, and with keepalived, you can have redundancy as well. It gives you a double benefit of being a WAF if you implement modsecurity on a NGINX or stripped apache instance. There is even someone building a wrapper for ATS to include modsecurity natively.

    It’s extremely high performance, and even on a very inexpensive set of VMs, it can process a ton of traffic. There are several case studies that show a 2x to 2.5x performance over HAProxy and even more than Varnish, and you get the benefit of ESI support built in with ATS, so Magento and Prestashop hosts have a really excellent solution if they implement an FPC with ESI support.

  15. Thrawn

    Modern HAProxy can handle SSL and HTTP compression, so it’s just 3 levels: HAProxy, cache, webservers. You’re not going to get less than 2 levels (load balancer and webservers) in any case, so this shouldn’t be a big deal. And HAProxy has a tiny footprint, so I can’t see that you’d need to spend big bucks on the hardware for it.

  16. Heard about Appcito CAFE.

    Appcito, a venture backed company, has developed a multi-cloud application delivery infrastructure for applications deployed over a public (e.g. AWS) or private enterprise cloud like.

    The Appcito Cloud Application Front-End Service (CAFE) delivers unified application services like Load balancing, application security, application front-end optimization along with application analytics and insights. In addition, it provides traffic steering and analytics in DevOps continuous deployment workflows. Legacy approaches using Application Delivery Controller (ADC) or using open source to try and build these capabilities in the applications (DIY) are costly, not cloud native and with business models not conducive to cloud. Appctito services are cloud-native, highly available and elastic and do not require customers to manage software/appliances.

    Appcito CAFE provides these main advantages or benefits:
    · Unified services: Appcito tightly integrates implementation of application availability (load balancing), security, performance and continuous deployment
    · Analytics and Insights: Provides analytics and unique insights into application traffic
    · Simple to activate: Appcito service activation takes less than five minutes.
    · Cloud Native: Appcito’s service is entirely cloud-native: built using cloud tools for optimizing applications that will be deployed in the cloud.
    The offering is available as a service with a consumption based business model.

    Founding team is from Citrix Netscaler and F5.

    CAFE load balances traffic across web servers, so websites and applications can cost-effectively scale operations. Our load balancers provide superior levels of functionality, enhanced flexibility and detailed visibility. CAFE also supports automatic failover between zones or data centers to accelerate disaster recovery in the event of an outage. Protect each of your applications at different levels of security through cross zone/DC load balancing.

  17. We have found EdgeDirector to be very useful. It is a DNS service that charges per query and allows you to geo-balance your DNS entries with alert checking of your services via various methods. If you don’t want to do the manual configuration of Apache with regions in response to the initiating IP, this is a great alternative.

  18. Sarah G.

    Check out Avi Networks (www.avinetworks.com) which launched end of last year just to solve this problem. Would welcome any thoughts

    • Brian Relch

      AVI seems to be just nginx and OpenSSL at its core. It’s got some cool automation and autoscaling that can save you time if you can figure it out yourself. for the pretty lipstick portion we are quite happy with elkstack.

  19. Veracruz

    F5 does have a lightweight node.js-based proxy to fit in the spaces where full BigIP is too big or expensive… and it has support. I’ve only played with a demo but it does seem pretty flexible.


  20. You might want to consider Appcito’s elastic load balancer. It has all the enterprise grade features (redundancy, ssl offload, traffic steering) you are looking for and at the same time, its built ground up for Cloud. Its lightweight and relatively inexpensive. You can find more info here

    • Jason Bridges

      Have you heard about appcito?? Oh my god it’s spam spam spam! Geez! Go away you freaking salesmen who do not know jack-squat about anything.

      Most like your product head-to-head will never touch haproxy – so stop with it.

      I have to agree on AWS – we ended up with nginx+haproxy and it seems to work well. Our provider provides this free of charge and simply charges us for traffic. Hate to even mention the name now that all this spam is here. FiberPeer.com Is who we use – in either case, we take near 150-300k hits per minute during busy times. Especially after we release an update to our software. Never once have had a problem to be honest. We have been there about 2-years and twice they have had maintenance; once planned and once unplanned. Both times they moved the load balancer via DNS (5 minute TTL’s – no customer complaints in either case.

      Anyway, I believe they primarily use nginx+haproxy – I am not completely certain of that. I do know they were making several changes on the versions they run but still.


Leave a Reply to Hendren Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.