Tag Archives: Cloud

Containers, Configuration Management, and The Right Tool for the Right Job

Docker brings an incredibly appealing method of managing applications to the table, but also requires us to rebuild a lot of systems that aren’t broken. In this post I’m going to look at the pros and cons of Docker and its accompanying ecosystem, and take a look at how one might start to leverage the best parts of Docker without rewriting everything.

What is it about Docker that is so exciting? It was a moonshot experiment that struck home. Rather than providing an incremental improvement over existing infrastructure patterns, Docker takes several leaps forward by providing a fresh way of managing our applications and making them accessible to developers at the same time.

Part of Docker’s success relies on providing highly opinionated solutions to the problems that come with containerizing components in your system. While these solutions are invaluable in terms of accessibility and gaining adoption, they are neither the only solutions nor necessarily the best ones in every case. Even if you are sure you want to subscribe to the opinionated “Docker way” or think it’s a worthwhile trade-off, you will still be accepting a new set of problems in exchange for the old ones, but the new set doesn’t come with the benefit of a decade or so of tools and experience to leverage.

In this post I’m going to discuss what makes Docker attractive to me and what I’m not such a fan of. Then I’m going to explore a hybrid approach that seeks to take advantage of the best parts of Docker without locking me in to the not-so-great parts.

P.S. I was hoping to put together a working demo of the architecture I describe below, but the proposed integration is still not possible… so that’s not going to happen. I’ve sat on this post for a while hoping things would change, but they haven’t, and I’ve decided instead to put this out there as is, as a theoretical architecture.

The Good

Some aspects of docker are home runs. In a time where microservices rule and developing software for the cloud is an obvious choice, how can you pass up a tool that makes managing a gazillion apps as cheap and easy as managing your monolith? And it’s a DevOps game changer: In the same way that AWS removed the friction between dev and ops for provisioning a VM, Docker removes the friction of configuring an app’s dependencies and installation. What’s more, local dev of even dozens of applications can be kept lean, and we’re closer than ever to feeling confident in “it works on my laptop.”

In summary:

  • Dense clouds
  • Transactional installs
  • Bundled dependencies
  • Tools for packaging deterministic and repeatable deploys (and rollbacks)
  • Developer workflow is simple and production-like

The Bad

A lot of the design decisions of Docker involve trade-offs, and networking is no exception.

For local dev, where managing many VMs is especially difficult and high availability isn’t important, Docker’s unique method of standing up a virtual bridge interface and allocating new IPs as containers are started is really convenient. But it is less than complete when you start worrying about high availability and exposing your services to external systems. Layering in additional networking or mapping ports to the bridge interface starts to solve this problem, but it also leaves you with a jumble.

Service discovery largely abstracts away this jumble, but at this point we’ve gone through at least three networking transformations to effectively address our services and we haven’t even started to integrate with non-Docker managed services.

Don’t get me wrong, I definitely think service discovery is the way to go. I just think that since Docker has coupled networking so tightly with its implementation, it should have made it more flexible and done more to make inter-host communication work the same as intra-host communication. Additionally, better hooks to integrate with existing software-defined networks would make all the integration work feel less like a yak shave.

Isolation security is also a concern, but it is easier to shrug off because it should be coming soon. For the time being, however, there is a lack of user namespaces in Docker containers, so UID 0 (root) in a container is also UID 0 (root) on the host machine and has all the access that comes with that.

Another concerning thing about Docker is the Docker hub. Although you don’t have to use this service or the images housed there in production, it’s presented in such a way that many people still do. Even with the addition of download signature checks, we are still left with an index of images that aren’t particularly well vetted or necessarily kept up to date. Many are built on top of full OSes that expose a larger attack surface than is necessary, and there is no established technique to ensure the base OS is up to date on its patches. There are great resources for building thin base OSes and ways to make sure they are kept up to date, but this is still more management left unaddressed.

In summary:

  • User namespace security
  • Docker hub security
  • Networking

The Ugly

One of the first things you realize with Docker is that you have to rethink everything. One is drawn in by the prospect of encapsulating their apps in a clean abstraction, but after fooling around with Supervisord for a while, most people start down the slippery slope of rewriting their entire infrastructure to keep their implementation clean.

This is because Docker isn’t very composable when taken as a whole. If you want to talk to something in a docker container, you need an ambassador. If something in a docker container needs to talk to something not managed in docker, you need an ambassador. Sometimes, you even need an ambassador for two apps both running in containers. This isn’t the end of the world, but it’s more work and is a symptom of how the docker containers are really only composable with other Docker containers, not with our systems as a whole.

What this means is that to leverage the parts of docker we want (the transactional installs, bundled dependencies, and simplified local dev), we have to rewrite and rewire a whole lot of other stuff that wasn’t broken or giving us trouble. Even if it was, you’re still forced to tackle it all at once.

If we were being honest about the shipping container analogy, we’d end up with a container ship not just carrying containers but built with containers as well!

A lot of these problems come from the same thing that makes Docker so easy to use: the bundling (and resulting tight coupling) of all components needed to build, manage, and schedule containers.

This becomes especially clear when trying to place Docker on Gabriel Monroy’s Strata of the Container Ecosystem. Although he places Docker in layer 4 (the container engine), aspects of it leak into almost every layer. It’s this sprawl that makes Docker less composable and is why integrating with it often entails a huge amount of work.

Summary:

  • Incompatibile with config management
  • Not composable with existing infrastructure patterns and tools

If not Docker, Then What?!

I’m not saying we should forget about docker and go back to the tried-and-true way of doing things. I’m not even saying to wait for the area to mature and produce better abstractions. I’m simply suggesting you do what you’ve always done: Choose the right tool for the right job, pragmatically apply the parts of Docker that make sense for you, and remember that the industry isn’t done changing, so you should keep your architecture in a state that allows for iteration.

Other players in this space

Part of Docker’s appeal is its dirt simple bundling of a new approach that removes a lot of pain we’ve been having. But there are other tools out there that solve these same problems.

  • System containers like OpenVZ or LXD provide similar cloud density characteristics
  • rkt is (almost ready to be) a competing application container that promises to implement a more composable architecture
  • Snappy Ubuntu offers an alternative model for transactional installs, bundling dependencies, and isolation
  • Numerous SDN technologies
  • Config management (Puppet, Chef, Ansible) provides deterministic and repeatable deploys
  • Vagrant simplifies local development in production-like environments

I have no doubt that we will look back and see Docker as the catalyst that led to a very different way of treating our software, but it isn’t going to stay the only major player for long, and some of the old rules still apply: Keep your architecture in a place that allows for iteration.

Lxd, cfg mgmt, docker, and the next generation of stateless apps

So what would a cloud architecture that adopts just the good parts of Docker look like?

First off, here are the characteristics that are important to me and that I would like to support:

  • The density and elasticity of containers.
  • Transactional installs and rollbacks for my applications.
  • The ability to develop locally in a near production environment (without near production hardware).
  • Ephemeral systems.
  • Managed systems (I’m not comfortable making them immutable because I trust config management tools more than a home-built “re-roll everything” script to protect me against the next bash vulnerability.).
  • A composable platform that doesn’t rely on the aspects of Docker (like networking) that would make iterating on it difficult.

One way to accomplish this it is to replace our traditional VMs with a system containers like LXD, continue managing infrastructure on those nodes the same way we always have with tools like Puppet, and start installing the service each node maintains in an application container like Docker.

I wish I could put together a demo to illustrate this, but right now running Docker on LXD is infeasible.

With this setup, we would have to change relatively little: We can expose our application on known ports to abstract away nonstandard networking; we only have one app so the namespace security vulnerability isn’t a problem; and our infrastructure only needs incremental updates to support a container instead of a process.

Scaling can be achieved by adding new system container instances, and, while not as fast as spinning a new application container, it’s still something that can be automated. We also don’t have quite the same density, but we’ve retained most of the benefits there as well.

With this as a starting point, we can build out more twelve-factor cloud support based on what’s most important to us: Service discovery, externalized config, software-defined networking, etc.

Conclusion

The tired old debate of “Containers vs. Configuration Management” is rooted in fallacy. We’ve never tried to solve all of our problems with one technology before (You wouldn’t pull down your Java libraries with Puppet, or keep your load balancer config in Maven Central), so why do we have to start now?

Instead, I recommend we do what we always do: Choose the right tool for the right job.

I think there is definitely room for system containers, application containers, and configuration management to coexist. Hopefully more work will be done to make these two great technologies play nicely together.

I (Ben Schwartz) am a software architect at Kasasa by BancVue where lately I’ve been spending most of my time standing on the shoulders of our awesome DevOps culture to play with emergent infrastructure tools and techniques. Sometimes my experimentation makes it to my blog (including the original posting of this article) which can be found at txt.fliglio.com.

This article is part of our Docker and the Future of Configuration Management blog roundup running this November.  If you have an opinion or experience on the topic you can contribute as well

Leave a comment

Filed under DevOps

CloudAustin June Meeting – Best Practices for Scalability

The Agile Admins also organize the CloudAustin user group, and we wanted to let everyone know about our upcoming June meeting. It’s 6-8 PM on Tuesday June 16 at Rackspace. RSVP on the meetup page!

Talk: Best Practices for Scalability (Scale to more than a Billion hits/day)

In this talk Chander Dhall will share his real-world experiences in scaling web apps, and some key insights and best practices. You’ll learn how to architect and develop applications on any Web stack so that they are easy to scale. If time permits Chander will go deep into performance too.

Chander is a Microsoft MVP, ASP.NET Insider, Web API Advisor, INETA speaker and open source contributor, with years of experience in enterprise software development. He started coding when he was 6, and created his first successful software product at the age of 14. He is the dev chair of DevConnections, and he works in a goal-oriented, technologically-driven, fast-paced Agile (SCRUM) environment. He has a master’s degree in computer science with speciacialization in algorithms, principles and patterns, and is focused on building high-performing modular software. Chander leads the HTML5/Node.js group in Los Angeles and the .NET user group at UTDallas, co-organizes Angularjs meetup in Austin and has spoken at numerous conferences and code camps all over the world. http://chanderdhall.com/, Twitter @csdhall

Sponsor: Box.com

Come on out!  And if you want to speak or sponsor in the future, just email austin-cug-admin@googlegroups.com.

1 Comment

Filed under Cloud

Awesome Upcoming Austin Techie Events

We’re entering cool event season…  I thought I’d mention a bunch of the upcoming major events you may want to know about!

In terms of repeating meetings you should be going to,

  • CloudAustin – Evening meeting every 3rd Tuesday at Rackspace for cloud and related stuff aficionados! Large group, usually presentations with some discussion.
  • Agile Austin DevOps SIG – Lunchtime discussion, Lean Coffee style, at BancVue about DevOps. Sometimes fourth Wednesdays, sometimes not. There are a lot of other Agile Austin SIGs and meetings as well.
  • Austin DevOps – Evening meetup all about DevOps.  Day and location vary.
  • Docker Austin – First Thursday evenings at Rackspace, all about docker.
  • Product Austin – Usually early in the month at Capital Factory. Product management!

3 Comments

Filed under DevOps

AWS re:Invent Keynote Day 1 Takeaways

Sadly I couldn’t attend this year, but heck that’s what the Internet is for.  Here’s the interesting bits from the AWS re:Invent Day 1 keynote (livestreamed here). Loads of interesting stuff.

  1. AWS is growing revenue >40% YOY, far outstripping other large IT companies – EC2 use grew 99% YOY and S3 usage 137%, they have 1M active customers now. (Microsoft cloud services report 128% YOY growth as well.)
  2. New product announcement for Aurora – new commercial-grade database engine – fully MySQL compatible but 5x the performance, available through Amazon RDS, 1/10 the cost of the commercial DB engines (starts at 29 cents an hour, ~$210/mo). Can do 6M inserts/second and 30M selects/second. Highly durable (11 9’s), crash recovery in seconds with no data loss. Nice!
  3. SLDC stuff!
    1. CodeDeploy (was internal tool called Apollo), a new code-deployment system that lets you do rolling updates, rollbacks, and tracks deployment health. This works for all languages and is free. They use it internally for 95 deploys/hour on their own stuff.
    2. In early 2015 will come some more software lifecycle management services – first is CodePipeline for continuous integration and deployment (also used internally)
    3. Second is CodeCommit as a managed code repository that can colocate with where you’re going to deploy and has no size limits of repos or files. These “integrate with” github, jenkins, chef, etc. though it’s not clear how they don’t cannibalize them.
  4. Security stuff! Big push to be able to say “we easily surpass the security you can do on premise.”
    1. FISMA, ITAR, FIPS, FedRAMP, HIPAA, ISO 9001
    2. Current encryption approach is either “let Amazon manage keys” or use their CloudHSM hosted key thing, both of which are still a pain. As a result they’re launching AWS Key Management Service as a HA service that manages keys, provides one-click encryption and transparent key rotation.
    3. AWS Config is a new-gen agile CMDB with full visibility into all your AWS resources. You can query it and see relationships and show scope of a config change. Streams all config changes out to you.
    4. A new-gen service catalog called AWS Service Catalog available early 2015. Create and share product portfolios, let internal people launch them, tracking and compliance.
  5. Enterprise Cloud Adoption Patterns
    1. Often the first wave of moving into the cloud for enterprises is moving dev and test environments to run in AWS for flexibility and spin up/down for cost savings and  brand new apps, custom written for the cloud
    2. Second wave is web sites and digital transformation (media, corp sites, ecomm) and analytics, since mass processing and sharing is cheap in the cloud – data warehouses (like pfizer’s). And mobile app back ends – phone, tablet, gps, more.
    3. Third wave is business critical applications.  Macmillan and Hoya run their SAP in AWS. Conde Nast runs HR and Legal there.
    4. New wave – you’re starting to see entire datacenter migration and consolidation as DCs come up for lease (Hess, Conde Nast, NewsCorp). SunCorp. Time Inc., GPT, Nippon Express moving “all in” to AWS – many ISVs as well. The CIA moved to AWS and now Intuit is doing so now as well.
    5. Intuit moved their “TurboTax AnswerXchange” app there to deal with tax time peaks last year and the scales fell from their eyes when they did so – 6x cost cut, setup 1/5 of the time, faster development. They started doing more and realized the global datacenters, ease of integration with acquisitions, and dev recruiting benefits. They have 33 services on AWS now, and have moved mint.com there. They have decided to move everything else there now. Funny how once companies start looking at how much they accomplish instead of just the monthly cost the “cloud is more expensive at scale” argument gets dropped like a flaming bag of poo.
  6. Hybrid cloud
    1. Various stuff like directory service (AD in the cloud) and identity federation and storage gateway and SystemCenter and vCenter integration already exist to power mixed shops
    2. Johnson & Johnson went on for a while about their use of AWS.  They are planning a 25,000 seat deployment of Workspaces (virtual desktop offering, like Citrix).

Whew, that’s the quick notes version.  Aurora is obviously of interest – a lot of the fretting over whether to use mySQL or RDS I’ve seen will get settled by this – it was just ‘well, run the same thing yourself or have them do it…” and now it’s “have them run something insanely better”. But the SDLC tools are also interesting – they made noise about how these “work with!” ansible, jenkins, git, etc. but that seems mildly disingenuous, without any more looking into it yet they sound more like direct competition for them. But the config and service catalog could be great extensions – yay for simple composable services, not huge painful “BSM/ITMOM suites”.

Feel free and share your thoughts on the announcements in the comments section!

3 Comments

Filed under Cloud, Conferences

The Cloud Procurement Pecking Order

I was planning to go to this meeting here in town about “Preparing for the post-IaaS phase of cloud adoption” and it brought home to me how backwards many organizations are when they start thinking about cloud options. So now you get Ernest’s Cloud Procurement Pecking Order.

What many people are doing is moving in order of comfort, basically, as they start moving from old school on prem into the cloud.  “I’ll start with private cloud… Then maybe public IaaS… Eventually we’ll look at that other whizbang stuff.” But here’s what your decision path should be instead. It’s the logical extension of the basic buy vs build strategy decision you’re used to doing.

Cloud Procurement Flowchart

Look at the functionality you are trying to fulfull.  Now ask in order:

  1. Is it available as a SaaS solution?  If so, use that. You shouldn’t need to host servers or write code for many of your needs – everything from email to ERP is commoditized nowadays. This is the modern equivalent of “buy, don’t build.” You don’t get 100% control over the functionality if you buy it, but unless the function is super core to your business you should simply get over that.
  2. [Optional] Does it fit the functional profile to do it serverless? Serverless is basically “second gen PaaS with less fiddly IaaS in it” so this would be your second step. Amazon has Lambda and Azure and Google have shipped competitors already. Right this moment serverless tech is still pretty bleeding edge, so you’d be forgiven for skipping this step if you don’t have pretty high caliber techies on staff.
  3. Can I do it in a public PaaS?  Then use a public PaaS (Heroku/Beanstalk/Google App Engine/Azure), unless you have some real (not FUD) requirements to do it in house.
  4. Can I do it in a private PaaS? Then use Cloudfoundry or similar. Or do you really (for non-FUD reasons) need access to the hardware?
  5. Can I do it in public IaaS?  Then use Amazon, or Azure. Or do you really (for non-FUD reasons) need it “on premise” (probably not really on premise, but in some datacenter you’re leasing – which is different from being outsourced in the cloud why)?  Even hardcore hardware render is done in the cloud nowadays (you can get GPU driven instances, SSDs, etc.)
  6. Can I do it in a private cloud? Use VMWare Cloud or Openstack. This is your final recourse before doing it the old fashioned way – unless you have extremely unique hardware requirements, you probably can. Also, you can do hybrid cloud – basically private cloud plus public cloud (IaaS only really). This gets you some of the IaaS benefits while complicating your architecture.

What About Compliance?

Very few compliance requirements exist that cannot be satisfied in the cloud.  There are large financials operating in the cloud, people with SOX and PCI and FISMA and NIST and ISO compliance needs… If your reason for running on prem is “but compliance” there’s a 90% chance you are just plain wrong, and coasting on decade-old received wisdom instead of being well informed about the modern state of cloud technology and security and compliance. I’ve personally helped pure-cloud solutions hit ISO and TUV and various other compliance goals.

What About The Cost?

This ordering seems to be inverted from how people are inching into the cloud. But the lower on this list you are, the less additional value you are getting from the solution (assuming the same price point). You should instead be reluctantly dragged into the lower levels on this list – which require more effort and often (though not always) more expense. A higher level needs to be a lot more expensive to justify the additional complexity and lag of doing more of the work yourself.

“But what about the cost,” you say, “the cloud gets more expensive than me running a couple servers?” It’s easy to be penny wise but pound foolish when making cloud cost decisions.

You need to keep in mind the real costs of your infrastructure when you do this – I see a lot of people spending a lot of work on private cloud that they really shouldn’t be. If you simply compare “buying servers” with “cost per month in Amazon” it can seem, using a naive analysis, like you need to go hybrid on prem after a couple hundred thousand dollars appear on your bill. But:

1. Make sure you are taking into account your fully loaded cost (includes data center, power cooling, etc.) of all assets (servers, storage, network…) you are using to do this private. Use the real numbers, not the “funny money” numbers – at a previous company we allocated network and other shared costs across the entire company, while “our IT budget” had to pay for servers, so that was the only number used in a comparison since it was our own department’s costs only that were considered – don’t be a goon (technical term for a local optimizer),  you should consider what it’s costing your entire company. Storage especially is way cheaper in the cloud versus enterprise SANs.

2. Make sure you are taking into account the cost of the manpower to run it.  And that’s not just the techies’ salary (fully loaded with benefits/bonuses), and the proportion of each layer of management going up that has to deal with their concerns (Even if the director only has to spend 30% of his time messing with the data center team, and the VP 10%, and the CTO 5%, and the CEO 1% – that’s a lot of freaking money you need to account for). It’s also the opportunity cost of having people (smart technical people) doing your plumbing instead of doing things to forward your company.  I would argue that instead of putting in the employee’s salary in this calculation, you’d do better to put in your revenue per employee!  Why? Because for that same money you could have someone improving product, making sales, etc. and making you additional revenue. If all you are looking at is “cost reduction” you are probably divorced enough from the business goals of your organization that you are not making good decisions. This isn’t to say you don’t need any of that manpower, but ideally with more plumbing being outsourced you can turn their technical skills to something of more productive use.

3. Make sure you are taking into account the additional lag time and the cost of that time to market delay from DIYing. Some people couch this as just for purposes of innovation – “well, if you’re a small, quick moving, innovative firm or startup, then this velocity matters to you – if you’re a larger enterprise, with yearly budget cycles, not so much.” That’s not true. Assuming you are implementing all this stuff with some end goal in mind, you are burning value along with time the longer it takes you to deliver it – we like to call that cost of delay. Heck, just plain cost of money over that period is significant – I’ve seen companies go through quite a set of gyrations to be able to bill 30 days earlier to get that additional benefit; if you can deliver projects a month earlier from leveraging reusable work (which is all that SaaS/PaaS/IaaS solutions are) then you accelerate your cashflow. If you have to wait 12 months for the IT group to get a private cloud working, you are effectively losing the benefit of your deliverable * 12 months. “We saved $10k/year on hosting costs!”  “Great, can we deliver our product that will make us $10k/month now, or do we get to continue to put ourselves out of business with cost cutting?”

4. Account for complexity.  The problem with “hybrid cloud,” in most implementations, is that it’s not seamless from on prem to public, and therefore your app architecture has to be doubly complicated.  In a previous position where I ran a large SaaS service, we were spread across AWS (virtual everything) and Rackspace (vserver, F5 LBs, etc.) and it was a total nightmare – we were trying to migrate all the way out to the cloud just so we could delete half of the cruft in all our code that touched the infrastructure – complexity that caused production issues (frequently) and slowed our rate of delivering new functionality. The KISS principle is wrathful when ignored.

I’m not saying hybrid cloud, private cloud, etc. are never the answer – but I would say that on average they are usually not the right answer, and if you are using them as your default approach then it’s better than even money you’re being inefficient. Furthermore, using SaaS and PaaS requires less expertise (and thus money) than IaaS which uses less than private cloud – people justify “starting with private” because you are “leveraging skill sets” or whatever – and then 6 months later you have a whole team still trying to bake off OpenStack vs Eucalyptus when you could have had your app (you know, the thing you actually need to fulfill a business goal) already running in a public PaaS. I’m not sure why I need to say out loud “delivering the most amount of value with the least amount of effort, time, and expenditure is good” – but apparently I do. Just because you *can* do something does not mean you *should* do it.  You need to carefully shepherd your time to delivery and your costs, and not just let things float in a morass of IT because “these things take time…”

5 Comments

Filed under Cloud

AWS Dying! Rackspace Pulls Out Of Cloud! News At 11!

Boy, it’s been quite a week for the cloud-schadenfreude crowd. If you listen to the various news outlets, apparently Rackspace has given up on cloud and Amazon is in free-fall. Here’s some representative hack jobs pieces:

More accurate are these:

Let’s look at what’s actually going on.

First, Rackspace.  I was on the Spiceworks forum yesterday and the news is definitely being interpreted as “Rackspace is getting out of cloud, don’t consider them any more.” Now, it is their own fault for bungling the messaging here, but if you actually go look at what they are doing, at its heart they are making this change:

Rackspace Cloud will be sold only with a support contract now.

Yes, that’s it.  That’s the change. Now it’s “managed cloud!” Which is fine, a heck a lot of software I buy has mandatory maintenance contracts nowadays, but this doesn’t mean “Rackspace is leaving the cloud business!” They just want to add in their “Fanatical Support ™” to the value proposition and not compete purely on a bare-metal (bare-API?) SaaS “how much does a 2-CPU 4 GB server cost” basis.

Rackspace has to get back out in front of this messaging hard – it’s definitely made its way to the practitioner trenches as “they’re pulling out.” I mean, I have to say Rackspace’s strategy is pretty opaque to most folks, but this message misstep has graduated from “muddled and unclear” to “actively harmful.”

Now, Amazon.  The real story is:

Amazon Web Services only grew 38.39% last quarter.

For a large company that’s a pretty good growth rate, right, is yours higher? The press likes to turn IaaS into a 3 provider horse race. But so far – it’s not. Check out this recent (March 2014) Synergy Research graph.

CIS_Q413_graphic

The fact of the matter is that Amazon is beating the holy hell out of everyone else in IaaS. It’s more neck and neck in PaaS, but sadly the entire PaaS market is still low (due to Joe Average IT Shop basically interpreting PaaS pitches as someone standing up and screaming “I’m a sorcerer!!!”).

IBM, HP, etc. don’t have credible offerings yet.  I know they’re investing, I know they have roped some random companies that love them into doing it, but they are just not there yet. IBM is not a commodity company, they’re a “you have a billion dollar contract with us we’re going to build out whatever we feel like with that.”

Google, same thing. It’s cool, it’s well priced, it’s dev friendly – but at the big price cut announcement, we had a big get-together at Capital Factory here in town. I looked around at the crowd of 40 clouderati types and said “OK, so who is comfortable running production apps on Google cloud yet?” Result: zero. Google’s throwing money at it but as with most of Google’s new offerings, it’s hard to trust it’s not just going to dry up tomorrow and get cancelled because they are running after private spaceships or whatever now, and nothing makes them money like their ad business so “it’s revenue generating” won’t save it. And Google is so bad at enterprise support…

Microsoft Azure was really good. Better than it had a right to be!  I was very impressed with Azure in years 1 and 2. Execution was good (we used it for a SaaS service at National Instruments) and the vision was definitely “where the puck is going to be.” But post-Ozzie, it hasn’t exactly been shaking the sheets. At CloudAustin there was more Azure interest two years ago than there is now. They were going strong on dev friendliness and all, but trying to get into IaaS has been a distraction and they just aren’t keeping pace with Amazon’s rate of new features. Docker support, SSDs, new instances, vCenter integration, Dropbox competitor, desktop-as-a-service Citrix competitor…

Let me address the four big “why AWS is crashing and burning (despite being in an obvious position of market dominance)” points from the “Scorpion” article.

  1. AWS is not the low price provider.
    Eh. Not sure why this is relevant and also not sure it’s true for what you are getting… It’s like saying “there are books cheaper than that book you just bought.” Well sure there are, but do they have the information I want in them? See below for why not always having the lowest cent per minute under Google and Microsoft doesn’t really concern me.
  2. AWS is not the best product at anything – most of their features are mediocre knock offs of other products.
    This misses the point – their features are SIMPLER knockoffs of other products. That’s why it’s an accelerator. Dropbox and Salesforce and all the successful cloud entities have said “you know, some enterprise user left to their own devices is going to generate a list of 1000 requirements they don’t really need. Forget that. Let’s make the actual core functionality they need and leave off the rest so it’ll actually get used.” This is why they dominate the IaaS business. Many of their products are named to match. “SIMPLE email service.” “SIMPLE queue service.” “SIMPLE notification service.” This drives a new wave of architectural thought – instead of complicated services packed with stuff, what if instead I integrate simple, well-designed microservices? After doing a lot of cloud architecture work, those attributes are positives, not negatives.
  3. AWS is unbelievably lousy at support.
    I’m not sure I’d want to be in a race with Amazon, Microsoft, and Google to see who supports customers worse. I’m not sure I’ve ever been part of an enterprise happy with its Google support, and all experiences I’ve had with Microsoft support have been some Brazil-esque “you can’t actually ask them questions, only some VP is a designated contact on the corporate contract…”. Amazon is positioning themselves more like a hardware vendor, you don’t bother getting much support from them besides parts replacement, you get support from the managed hosting provider or whatnot that’s a MSP on top of them if you need it.
  4. Once you are at $200k / month of spend, it’s cheaper and much more effective to build your own infrastructure
    This is frequently untrue and based on people not understanding the full costs of getting stuck in the infrastructure business. What’s your cost of delay? Average enterprise “wait for servers” time is about 6 weeks; assuming you’re not just using them for nothing, your ROI is delayed by that amount. And what about all the operation of those complex systems? You can’t just stick in the salary of the developers and sysadmins you’d need – stick in your revenue per employee instead, because that headcount could be doing something useful for your company instead of plumbing. Not to mention the cascading percentage of each layer of management’s time spent worrying ab out the plumbing and the plumbers instead of conducting the core business of the company. Cost of delay from lost agility and opportunity cost are never taken into account but definitely should be.

I know a lot of the old guard want cloud to dry up and go away, it bothers their lovely datacenters.  And some of the very new guard resent it because Amazon continues to be so successful – they keep up a rate of innovation that new players can’t disrupt. But this whole week of “the cloud is falling” news is complete BS, and won’t amount to much.

Leave a comment

Filed under Cloud

Amazon Cuts Prices Too

Well, if nothing else I’m happy to have Google Cloud around to provide some competition to push Amazon Web Services.  Immediately after Google announced dramatic price drops, Amazon has responded doing the same!

Now if they can only also shame them into dropping their whole crazy reserve instance scheme and go to progressive discounts like Google just did too, the world will be better.

5 Comments

by | March 27, 2014 · 7:10 am

Google Cloud Update

We had a little get-together here in Austin today, sponsored by MomentumSI and hosted at Capital Factory (thanks to both!), to view the Google Cloud Platform newest product announcement webcast. About 24 local engineers showed up to watch.

You can view the whole thing yourself here, or just read my notes from the event.

Cloud Is Hard

Their thesis statement was that cloud, while cool, is still too hard for many people, hindering adoption or slowing innovation. So they’ve worked on making it easier.

Cost

Cost calculation is super complex (reserve, on demand, etc.). They talk about “other industry standard clouds” by which they mean Amazon Web Services. They note the drawbacks to reserved instances, which I am all totally in agreement on (see my earlier article Why Amazon Reserve Instances Torment Me for more on that). Specifically they note that reservations constrain your design choices, which is one of the great reasons to go to the cloud in the first place – Amen, brother!

Though cloud prices have been dropping 6-8% a year, hardware’s been dropping 20-30%. Why is Moore’s Law not translating into more sweet green in our pockets? It should, they contend. Thus, they are announcing on demand price drops:

  • GCE 32% price drop
  • Storage is now .026 cents/GB for any use
  • .02 c/GB for reduced durability storage
  • bigquery 85% reduction
  • can now purchase predictable throughput

Introducing sustained use discounts – no pre-plan or reserving ahead of time, instead prices automatically drop as VM usage is sustained over 25% of the month and then progressively from there. 100% use is a 53% discount over current (remember that includes the new 32% reduction, so another 21% from current for continued use). With linear machine cost scaling, makes it simple(r) to predict and calculate your costs.

Other Tradeoffs

Current cloud (hint: AWS) forces other tradeoffs: time to market vs scalability, flexibility (iaas) vs automatic management (paas), big data vs realtime data analysis.

But first, we interrupt our messaging to talk about other random new features based on customer feedback. To wit:

  • SuSE/Red Hat support
  • Windows Server 2008 R2 (preview) support
  • Cloud DNS service, accessible via API and console

The features are nice but even nicer was that they implemented these based on customer feedback, which means they consider this a real product with real customers and not just a fun tech thing for their own ends (which to be fair 80% of Google’s offerings are, and it can be hard to tell the difference).

Time to Market vs Scalability

So on scaling… You need deployment! Troubleshooting! Use tools you know!
They have a new “gcloud” command line tool
“gcloud init” pulls down the app via git, you can just edit, git commit, git push
They have a build service integrated – it spins up a jenkins/maven and builds, deploys – you can see release status in the console.
There’s also a new unified logs viewer with basic searching – like Splunk junior, with one cool dev feature. Click on the code in the stack trace and you’re put directly into the code in the console’s source view. Fix and commit, it auto-builds, bam you’re fixed.

IaaS vs PaaS

A new halfway state – “managed VMs.” It’s the normal PaaS, but in the config, you can tell it things to apt-get install onto the instances, so you can have more third party software than the PaaS previously allowed.
Also, you can “enable debugging” on an instance and then log in interactively.

Big Data vs Realtime Data Analysis

They’ve upped BigQuery to have 100k rows/sec ingest.
Example Demo: smart monitoring of 60 events/hour from 400k glen canyon power meters (17bn events/mo), with about 128k records. They did a visualization that is updating in near real time showing all those meters geolocated and you can go click on them to get realtime data.
He showed the complex BigQuery “bigjoin” to filter by meter lat/long from sep table and then by quartile across whole population. “Doing this in NoSQL would be impossible or very slow.”

They will be doing a Google Cloud roadshow soon – see cloud.google.com/roadshow – it looks like Austin will be on the list of cities!

Analysis

The good thing about getting a bunch of techies together to view this was the discussion afterwards.  The general sentiment was that:

1. The cost drops are nice and the approach to reserve/sustained use instances is much better. The reserve instance scheme is one of the worst things about AWS and if this drives them to adopt the same model, hooray!

2. The other new features (managed VMs, gcloud) are definitely nice. They are focusing on dev friendliness in their discussion but it’s a lot less clear how to operate this. If you’re really trying to stitch together a bunch of micro-services there’s not a lot of great support for that. They talk about using their PaaS and say “of course, if you use our PaaS you don’t need to carry a pager! You’d only need to do that if you’re doing IaaS and maintaining your own OSes.” That is dangerously naive and really made the whole group skittish. Most people there have done “play” things in Google’s cloud but are reticent to put mission critical items there, and this section of the presentation didn’t do a lot to improve that.

3. The BigQuery/realtime demo was impressive and multiple people would like to kick the tires on it.

Overall – it was a little light, but it was a keynote; the new features/pricing are all good; this shows more Google commitment to their cloud as a product but actual concerns still linger about maturity and suitability for realistically complex revenue-generating production applications.

 

Leave a comment

Filed under Cloud

ReInvent – Fireside Chat: Part 1

One of the interesting sessions at ReInvent was a fireside chat with Werner Vogels., where CEO’s or CTO’s of different companies/startups who use AWS talked about their applications/platforms and what they liked and wanted form AWS. It was a 3 part series with different folks, and I was able to attend the 1st one, but I’m guessing videos are available for the others online.  Interesting session, giving the audience a window into the way C level people think about problems and solutions…

First up, the CTO of mongodb…

Lots of people use mongo to store things like user profiles etc for their applications. Mongo performance has gotten a lot better because of ssd’s

Recently funded 150 million, and wanting to build out a lot of tools to be able to administer mongo better.

Apparently being a mongodb dba is a really high paying job these days!

User roles may be available in mongo next year to add more security.

Werner and Eliot want to work together to bring a hosted version of mongo like RDS.

Next up twilio’s Jeff Lawson

Jeff is ex amazon.

Untitled

Software people want building blocks and not some crazy monolithic thing to solve a problem. Telecom had this issue, and that is why I started Twilio.

Everyone is agile! We don’t have answers up front, but we figure out these answers as we go.

Started with voice, then moved to SMS followed by a global presence. Most customers of ours wanted something that didn’t want boundaries and just wanted an API to communicate with their customers.

Werner: It’s hard to run an API business. Tell us more…
Lawson: It is really hard. Apis are kinda like webapps when it comes to scaling. REST helps a lot from this perspective. Multi tenancy issues gets amplified when you have an API business.

Twilio apparently deploys 20 times a day. Aws really helps with deployment because you can bring brand new environments that look exactly like prod and then tear it down when things aren’t needed.

When it comes to api’s, we write the documentation first and show our customers first before actually implementing the API. Then iterate iterate iterate on the development.

Jeff asks: Make it easier to make vpc up and running.

Next up: Valentino with adroll (realtime bidding)

Untitled

There’s a data collection pipe which gets like 20 tb of data everyday.

Latency is king: Typically latency is like 50ms and 100ms. This is still a lot for us. I wish we had more transparency when it comes to latency inside aws and otherwise…

Why dynamo db? Didn’t find something simple at the time, and it was nice to be able to scale something without having to worry about it. We had 0 ops people at the time to work on scaling at the time.

Read write rates: 80k reads per second (not consistent), 40k writes per second.

Why erlang? You’re a python god.
I started working on Python with the twisted framework. But I realized that Python didn’t fit our use case well; the twisted system worked just as well but it would be complicated to manage it and needed a bit of hacks..

Today it would be hard to pick between erlang and go….

Leave a comment

Filed under Cloud, Conferences