Tag Archives: Cloud Computing

by Ernest Mueller | September 13, 2011 · 9:55 am

What Is Cloud Computing?

My recent post on how sick I am of people being confused by the basic concept of cloud computing quickly brought out the comments on “what cloud is” and “what cloud is not.” And the truth is, it’s a little messy, there’s not a clear definition, especially across “the three aaSes“. So now let’s have a post for the advanced students. Chip in with your thoughts!

Here’s my Grand Unified Theory of Cloud Computing. Rather than being a legalistic definition that will always be wrong for some instances of cloud, it attempts to convey the history and related concepts that inform the cloud.

The Grand Unified Theory of Cloud Computing

( ISP -> colo -> MSP ) + virtualization + HPC + (AJAX + SOAP -> REST APIs) = IaaS (( web site -> web app -> ASP ) + virtualization + fast ubiquitous Internet + [ RIA browsers & mobile ] = SaaS ( IDEs & 4GLs ) + ( EAI -> SOA ) + SaaS + IaaS = PaaS [ IaaS | PaaS | SaaS ] + [ devops | open source | noSQL ] = cloud

* Note, I don’t agree with all those Wikipedia definitions, they are only linked to clue in people unsure about a given term

Sure, that’s where the cloud comes from, but “what is the cloud?” Well, here’s my thoughts, the Seven Pillars of Cloud Computing. Having more of these makes something “more cloudy” and having fewer makes something “less cloudy.” Arguments over whether some specific offering “is cloud” or not, however, is for people without sufficiently challenging jobs.

The Seven Pillars of Cloud Computing

“The Cloud” may be characterized as:

An outsourced managed service
providing hosted computing or functionality
delivered over the Internet
offering extreme scalability
by using dynamically provisioned, multitenant, virtualized systems, storage, and applications
controlled via REST APIs
and billed in a utility manner.

You can remove one or more of these pillars to form most of the things people sell you as “private cloud,” for example, losing specific cloud benefits in exchange for other concerns.

Now there’s also the new vs old argument. There’s the technohipsters that say “Cloud is nothing new, I was doing that back in the ’90’s.” And some of that is true, but only in the most uninteresting way. The old and the new have, via alchemy, begun to help users realize benefits beyond what they did before.

Benefits of Cloud – What and How

Not New:

Virtualization
Outsourcing
Integration
Intertubes

Pretty New:

Multitenancy
Massively scalable
Elastic self provisioning
Pay as you go

Resulting Benefits:

agility
economy of scale
low initial investment
scalable cost
resilience
improved service delivery
universal access

Okay, Clouderati – what do you think?

2 Comments

Filed under Cloud

Tagged as Cloud, Cloud Computing, definition, IaaS, paas, SaaS

by Ernest Mueller | November 18, 2010 · 1:27 pm

Austin Cloud User Group Nov 17 Meeting Notes

This month’s ACUG meeting was cooler than usual – instead of having one speaker talk on a cloud-related topic, we had multiple group members do short presentation on what they’re actually doing in the cloud. I love talks like that, it’s where you get real rubber meets the road takeaways.

I thought I’d share my notes on the presentations. I’ll write up the one we did separately, but I got a lot out of these:

OData to the Cloud, by Craig Vyal from Pervasive Software
Moving your SaaS from Colo to Cloud, by Josh Arnold from Arnold Marziani (previously of PeopleAdmin)
DevOps and the Cloud, by Chris Hilton from Thoughtworks
Moving Software from On Premise to SaaS, by John Mikula from Pervasive Software
The Programmable Infrastructure Environment, by Peco Karayanev and Ernest Mueller from National Instruments (see next post!)

My editorial comments are in italics. Slides are linked into the headers where available.

OData to the Cloud

OData was started by Microsoft (“but don’t hold that against it”) under the Open Specification Promise. Craig did an implementation of it at Pervasive.

It’s a RESTful protocol for CRUDdy GET/POST/DELETE of data. Uses AtomPub-based feeds and returns XML or JSON. You get the schema and the data in the result.

You can create an OData producer of a data source, consume OData from places that support it, and view it via stuff like iPhone/Android apps.

Current producers – Sharepoint, SQL Azure, Netflix, eBay, twitpic, Open Gov’t Data Initiative, Stack Overflow

Current consumers – PowerPivot in Excel, Sesame, Tableau. Libraries for Java (OData4J), .NET 4.0/Silverlight 4, OData SDK for PHP

It is easier for “business user” to consume than SOAP or REST. Craig used OData4J to create a producer for the Pervasive product.

Questions from the crowd:

Compression/caching? Nothing built in. Though normal HTTP level compression would work I’d think. It does “page” long lists of results and can send a section of n results at a time.

Auth? Your problem. Some people use Oauth. He wrote a custom glassfish basic HTTP auth portal.

Competition? Gdata is kinda like this.

Seems to me it’s one part REST, one part “making you have a DTD for your XML”. Which is good! We’re very interested in OData for our data centric services coming up.

Moving your SaaS from Colo to Cloud

Josh Arnold was from PeopleAdmin, now he’s a tech recruiter, but can speak to what they did before he left. PeopleAdmin was a Sungard type colo setup. Had a “rotting” out of country DR site.

They were rewriting their stack from java/mssql to ruby/linux.

At the time they were spending $15k/mo on the colo (not including the cost of their HW). Amazon estimated cost was 1/3 that but really they found out after moving it’s 1/2. What was the surprise cost? Lower than expected perf (disk io) forced more instances than physical boxes of equivalent “size.”

Flexible provisioning and autoscaling was great, the colo couldn’t scale fast enough. How do you scale?

The cloud made having an out of country DR site easy, and not have it rot and get old.

Question: What did you lose in the move? We were prepared for mental “control issues” so didn’t have those. There’s definitely advanced functionality (e.g. with firewalls) and native hardware performance you lose, but that wasn’t much.

They evalled Rackspace and Amazon (cursory eval). They had some F5s they wanted to use and the ability to mix in real hardware was tempting but they mainly went straight to Amazon. Drivers were the community around it and their leadership in the space.

Timeline was 2 years (rewrite app, slowly migrate customers). It’ll be more like 3-4 before it’s done. There were issues where they were glad they didn’t mass migrate everyone at once.

Technical challenges:

Performance was a little lax (disk performance, they think) and they ended up needing more servers. Used tricks like RAIDed EBSes to try to get the most io they could (mainly for the databases).

Every customer had a SSL cert, and they had 600 of them to mess with. That was a problem because of the 5 Elastic IP limit. Went to certs that allow subsidiary domains – Digicert allowed 100 per cert (other CAs limit to much less) so they could get 100 per IP.

App servers did outbound LDAP conns to customer premise for auth integration and they usually tried to allow those in via IP rules in their corporate firewalls, but now on Amazon outbound IPs are dynamic. They set up a proxy with a static (elastic) Ip to route all that through.

Rightscale – they used it. They like it.

They used nginx for the load balancing, SSL termination. Was a single point of failure though.

Remember that many of the implementations you are hearing about now were started back before Rackspace had an API, before Amazon had load balancers, etc.

In discussion about hybrid clouds, the point was brought up a lot of providers talk about it – gogrid, opsource, rackspace – but often there are gotchas.

DevOps and the Cloud

Chris Hilton from Thoughtworks is all about the DevOps, and works on stuff like continuous deployment for a living.

DevOps is:

collaboration between devs and operations staff
agile sysadmin, using agile dev tools
dev/ops/qa integration to achieve business goals

Why DevOps?

Silos. agile dev broke down the wall between dev/qa (and biz).

devs are usually incentivized for change, and ops are incentivized for stability, which creates an innate conflict.

but if both are incentivized to deliver business value instead…

DevOps Practices

version control!
automated provisioning and deployment (Puppet/chef/rPath)
self healing
monitoring infra and apps
identical environments dev/test/prod
automated db mgmt

Why DevOps In The Cloud?

cloud requires automation, devops provides automation

References

“Continuous Delivery” Humble and Farley
Rapid and Reliable Releases InfoQ
Refactoring Databases by Ambler and Sadalage

Another tidbit: they’re writing puppet lite in powershell to fill the tool gap – some tool suppliers are starting, but the general degree of tool support for people who use both Windows and Linux is shameful.

Moving Software from On Premise to SaaS

John Mikula of Pervasive tells us about the Pervasive Data Cloud. They wanted to take their on premise “Data Integrator” product, basically a command line tool ($, devs needed to implement), to a wider audience.

Started 4 years ago. They realized that the data sources they’re connecting to and pumping to, like Quickbooks Online, Salesforce, etc are all SaaS from the get go. “Well, let’s make our middle part the same!”

They wrote a Java EE wrapper, put it on Rackspace colo initally.

It gets a customer’s metadata, puts it on a queue, another system takes it off and process it. A very scaling-friendly architecture. And Rackspace (colo) wasn’t scaling fast enough, so they moved it to Amazon.

Their initial system had 2 glassfish front ends, 25 workers

For queuing, they tried Amazon SQS but it was limited, then went to Apache Zookeeper

First effort was about “deploy a single app” – namely salesforce/quickbooks integration. Then they made a domain specific model and refactored and made an API to manage the domain specific entities so new apps could be created easily.

Recommended approach – solve easy problems and work from there. That’s more than enough for people to buy in.

Their core engine’s not designed for multitenancy – have batches of workers for one guy’s code – so their code can be unsafe but it’s in its own bucket and doesn’t mess up anyone else.

Changing internal business processes in a mature company was a challenge – moving from perm license model to per month just with accounting and whatnot was a big long hairy deal.

Making the API was rough. His estimate of a couple months grew to 6. Requirements gathering was a problem, very iterative. They weren’t agile enough – they only had one interim release and it wasn’t really usable; if they did it again they’d do the agile ‘right thing’ of putting out usable milestones more frequently to see what worked and what people really needed.

In Closing

Whew! I found all the presentations really engaging and thank everyone for sharing the nuts and bolts of how they did it!

Velocity 2010 – Dueling Cloud Management Suppliers

Two cloud systems management suppliers talk about their bidness! My comments in italics.

Cloud Autoscaling in Enterprise Computing by George Reese (enStratus Networks LLC)

How the Top Social Games Scale on the Cloud by Michael Crandell (RightScale, Inc)

I am more familiar with RightScale, but just read Reese’s great Cloud Application Architectures book on the plane here. Whose cuisine will reign supreme?

enStratus

Reese starts talking about “naive autoscaling” being a problem. The cloud isn’t magic; you have to be careful. He defines “enterprise” autoscaling as scaling that is cognizant of financial constraints and not this hippy VC-funded twitter type nonsense.

Reactive autoscaling is done when the system’s resource requirements exceed demand. Proactive autoscaling is done in response to capacity planning – “run more during the day.”

Proactive requires planning. And automation needs strict governors in place.

In our PIE autoscaling, we have built limits like that into the model – kinda like any connection pool. Min, max, rate of increase, etc.

He says your controls shouldn’t be all “number of servers,” but be “budget” based. Hmmm. That’s ideal but is it too ideal? And so what do you do, shut down all your servers if you get to the 28th of the month and you run out of cash?

CPU is not a scaling metric. Have better metrics tied to things that matter like TPS/response time. Completely agree there; scaling just based on CPU/memory/disk is primitive in the extreme.

Efficiency is a key cloud metric. Get your utilization high.

Here’s where I kinda disagree – it can often be penny wise and pound foolish. In the name of “efficiency” I’ve seen people put a bunch of unrelated apps on one server and cause severe availability problems. Screw utilization. Or use a cloud provider that uses a different charging model – I forget which one it was, but we had a conf call with one cloud provider that only charged on CPU used, not “servers provisioned.”

Of course you don’t have to take it to an extreme, just roll down to your minimum safe redundancy number on a given tier when you can.

Security – well, you tend not to do some centralized management things (like add to Active Directory) in the cloud. It makes user management hard. Or just makes you use LDAP, like God intended.

Cloud bursting – scaling from on premise into the cloud.

Case study – a diaper company. Had a loyalty program. It exceeded capacity within an hour of launch. Humans made a scaling decision to scale at the load balancing tier, and enStratus executed the auto-scale change. They checked it was valid traffic and all first.

But is this too fiddly for many cases? If you are working with a “larger than 5 boxes” kind of scale don’t you really want some more active automation?

RightScale

The RightScale blog is full of good info!

They run 1.2 million cloud servers! hey see things like 600k concurrent users, 100x scaling in 4 days, 15k instances, 1:2000 management ratio…

Now about gaming and social apps. They power the top 10 Facebook apps. They are an open management environment that lives atop the cloud suppliers’ APIs.

Games have a natural lifecycle where they start small, maybe take off, get big, eventually taper off. It’s not a flat demand curve, so flat supply is ‘tarded.

During the early phase, game publishers need a cheap, fast solution that can scale. They use Chef and other stuff in server templates for dynamic boot-time configuration.

Typically, game server side tech looks like normal Web stuff! Apache+HAproxy LB, app servers, db cache (memcached), db (sharded mySQL master/slave pairs). Plus search, queues, admin, logs.

Instance types – you start to see a lot of larger instances – large and extra large. Is this because of legacy comfort issues? Is it RAM needs?

CentOS5 dominates! Generic images, configured at boot. One company rebundles for faster autoscale. Not much ubuntu or Windows. To be agile you need to do that realtime config.

A lot of the boxes are used for databases. Web/app and load balancing significant too. There’s a RightScale paper showing a 100k packets per second LB limit with Amazon.

People use autoscaling a lot, but mainly for web app tier. Not LBs because the DNS changing is a pain. And people don’t autoscale their DBs.

They claim a lot lower human need on average for management on RightScale vs using the APIs “or the consoles.” That’s a big or. One of our biggest gripes with RightScale is that they consume all those lovely cloud APIs and then just give you a GUI and not an API. That’s lame. It does a lot of good stuff but then it “terminates” the programmatic relationship. [Edit: Apparently they have a beta API now, added since we looked at them.]

He disagrees with Reese – the problem isn’t that there is too much autoscaling, it’s that it has never existed. I tend to agree. Dynamic elasticity is key to these kind of business models.

If your whole DB fits into memcache, what is mySQL for? Writes sometimes? NoSQL sounds cool but in the meantime use memcache!!!

The cloud has enabled things to exist that wouldn’t have been able to before. Higher agility, lower cost, improved performance with control, anew levels of resiliency and automation, and full lifecycle support.

1 Comment

Filed under Cloud, Conferences, DevOps

Tagged as Cloud Computing, enstratus, rightscale, scalability, scaling, systems management, velocityconf, velocityconf10

by Ernest Mueller | April 16, 2010 · 9:11 am

Amazon Web Services – Convert To/From VMs?

In the recent Amazon AWS Newsletter, they asked the following:

Some customers have asked us about ways to easily convert virtual machines from VMware vSphere, Citrix Xen Server, and Microsoft Hyper-V to Amazon EC2 instances – and vice versa. If this is something that you’re interested in, we would like to hear from you. Please send an email to aws-vm@amazon.com describing your needs and use case.

I’ll share my reply here for comment!

This is a killer feature that allows a number of important activities.

1. Product VMs. Many suppliers are starting to provide third-party products in the form of VMs instead of software to ease install complexity, or in an attempt to move from a hardware appliance approach to a more-software approach. This pretty much prevents their use in EC2. <cue sad music> As opposed to “Hey, if you can VM-ize your stuff then you’re pretty close to being able to offer it as an Amazon AMI or even SaaS offering.” <schwing!>

2. Leveraging VM Investments. For any organization that already has a VM infrastructure, it allows for reduction of cost and complexity to be able to manage images in the same way. It also allows for the much promised but under-delivered “cloud bursting” theory where you can run the same systems locally and use Amazon for excess capacity. In the current scheme I could make some AMIs “mostly” like my local VMs – but “close” is not good enough to use in production.

3. Local testing. I’d love to be able to bring my AMIs “down to me” for rapid redeploy. I often find myself having to transfer 2.5 gigs of software up to the cloud, install it, find a problem, have our devs fix it and cut another release, transfer it up again (2 hour wait time again, plus paying $$ for the transfer)…

4. Local troubleshooting. We get an app installed up in the cloud and it’s not acting quite right and we need to instrument it somehow to debug. This process is much easier on a local LAN with the developers’ PCs with all their stuff installed.

5. Local development. A lot of our development exercises the Amazon APIs. This is one area where Azure has a distinct advantage and can be a threat; in Visual Studio there is a “local Azure fabric” and a dev can write their app and have it running “in Azure” but on their machine, and then when they’re ready deploy it up. This is slightly more than VM consumption, it’s VMs plus Eucalyptus or similar porting of the Amazon API to the client side, but it’s a killer feature.

Xen or VMWare would be fine – frankly this would be big enough for us I’d change virtualization solutions to the one that worked with EC2.

I just asked one of our developers for his take on value for being able to transition between VMs and EC2 to include in this email, and his response is “Well, it’s just a no-brainer, right?” Right.

1 Comment

Filed under Cloud

Tagged as amazon, aws, Cloud, Cloud Computing, ec2, Virtualization

by Ernest Mueller | March 5, 2010 · 8:49 am

Microsoft Azure for Dummies – or for Smarties?

What Is Microsoft Azure?

I’m going to attempt to explain Microsoft Azure in “normal Web person” language. Like many of you, I am more familiar with Linux/open source type solutions, and like many of you, my first forays into cloud computing have been with Amazon Web Services. It can often be hard for people not steeped in Redmondese to understand exactly what the heck they’re talking about when Microsoft people try to explain their offerings. (I remember a time some years ago I was trying to get a guy to explain some new Microsoft data access thing with the usual three letter acronym name. I asked, “Is it a library? A language? A protocol? A daemon? Branding? What exactly is this thing you’re trying to get me to uptake?” The reply was invariably “It’s an innovative new way to access data!” Sigh. I never did get an answer and concluded “Never mind.”)

Microsoft has released their new cloud offering, Azure. Our company is a close Microsoft partner since we use a lot of their technologies in developing our company’s desktop software products, so as “cloud guy” I’ve gotten some in depth briefings and even went to PDC this year to learn more (some of my friends who have known me over the course of my 15 years of UNIX administration were horrified). “Cloud computing” is an overloaded enough term that it’s not highly descriptive and it took a while to cut through the explanations to understand what Azure really is. Let me break it down for you and explain the deal.

Point of Comparison: Amazon (IaaS)

In Amazon EC2, as hopefully everyone knows by now, you are basically given entire dynamically-provisioned, hourly-billed virtual machines that you load OSes on and install software and all that. “Like servers, but somewhere out in the ether.” Those kinds of cloud offerings (e.g. Amazon, Rackspace, most of them really) are called Infrastructure As A Service (IaaS). You’re responsible for everything you normally would be, except for the data center work. Azure is not an IaaS offering but still bears a lot of similarities to Amazon; I’ll get into details later.

Point of Comparison: Google App Engine (PaaS)

Take Google’s App Engine as another point of comparison. There, you just upload your Python or Java application to their portal and “it runs on the Web.” You don’t have access to the server or OS or disk or anything. And it “magically” scales for you. This approach is called Platform as a Service (PaaS). They provide the full platform stack, you only provide the end application. On the one hand, you don’t have to mess with OS level stuff – if you are just a Java programmer, you don’t have to know a single UNIX (or Windows) command to transition your app from “But it works in Eclipse!” to running on a Web server on the Internet. On the other hand, that comes with a lot of limitations that the PaaS providers have to establish to make everything play together nicely. One of our early App Engine experiences was sad – one of our developers wrote a Java app that used a free XML library to parse some XML. Well, that library had functionality in it (that we weren’t using) that could write XML to disk. You can’t write to disk in App Engine, so its response was to disallow the entire library. The app didn’t work and had to be heavily rewritten. So it’s pretty good for code that you are writing EVERY SINGLE LINE OF YOURSELF. Azure isn’t quite as restrictive as App Engine, but it has some of that flavor.

Azure’s Model

Windows Azure falls between the two. First of all, Azure is a real “hosted cloud” like Amazon Web Services, like most of us really think about when we think cloud computing; it’s not one of these on premise things that companies are branding as “cloud” just for kicks. That’s important to say because it seems like nowadays the larger the company, the more they are deliberately diluting the term “cloud” to stick their products under its aegis. Microsoft isn’t doing that, this is a “cloud offering” in the classical (where classical means 2008, I guess) sense.

However, in a number of important ways it’s not like Amazon. I’d definitely classify it as a PaaS offering. You upload your code to “Roles” which are basically containers that run your application in a Windows 2008(ish) environment. (There are two types – a “Web role” has a stripped down IIS provided on it, a “Worker role” doesn’t – the only real difference between the two.) You do not have raw OS access, and cannot do things like write to the registry. But, it is less restrictive than App Engine. You can bundle up other stuff to run in Azure – even run Java apps using Apache Tomcat. You have to be able to install whatever you want to run “xcopy only” – in other words, no fancy installers, it needs to be something you could just copy the files to a Windows PC, without administrative privilege, and run a command from the command line and have it work. Luckily, Tomcat/Java fits that description. They have helper packs to facilitate doing this with Tomcat, memcached, and Apache/PHP/MediaWiki. At PDC they demoed Domino’s Pizza running their Java order app on it and a WordPress blog running on it. So it’s not only for .NET programmers. Managed code is easier to deploy, but you can deploy and run about anything that fits the “copy and run command line” model.

I find this approach a little ironic actually. It’s been a lot easier for us to get the Java and open source (well, the ones with Windows ports) parts of our infrastructure running on Azure than Windows parts! Everybody provides Windows stuff with an installer, of course, and you can’t run installers on Azure. Anyway, in its core computing model it’s like Google App Engine – it’s more flexible than that (good) but it doesn’t do automatic scaling (bad). If it did autoscaling I’d be willing to say “It’s better than App Engine in every way.”

In other ways, it’s a lot like Amazon. They offer a variety of storage options – blobs (like S3), tables (like SimpleDB), queues (like SQS), drives (like EBS), SQL Azure (like RDS). They have an integral CDN. They do hourly billing. Pricing is pretty similar to Amazon – it’s hard to totally equate apples to apples, but Azure compute is $0.12/hr and an Amazon small Windows image compute is $0.12/hr (Coincidence? I think not.). And you have to figure out scaling and provisioning yourself on Amazon too – or pay a lot of scratch to one of the provisioning companies like RightScale.

What’s Unique and Different

Well, the largest thing that I’ve already mentioned is the PaaS approach. If you need OS level access, you’re out of luck; if you don’t want to have to mess with OS management, you’re in luck! So to the first order of magnitude, you can think of Azure as “like Amazon Web Services, but the compute uses more of a Google App Engine model.”

But wait, there’s more!

One of the biggest things that Azure brings to the table is that, using Visual Studio, you can run a local Azure “fabric” on your PC, which means you can develop, test, and run cloud apps locally without having to upload to the cloud and incur usage charges. This is HUGE. One of the biggest pains about programming for Amazon, for instance, is that if you want to exercise any of their APIs, you have to do it “up there.” Also, you can’t move images back and forth between Amazon and on premise. Now, there are efforts like EUCALYPTUS that try to overcome some of this problem but in the end you pretty much just have to throw in the towel and do all dev and test up in the cloud. Amazon and Eclipse (and maybe Xen) – get together and make it happen!!!!

Here’s something else interesting. In a move that seems more like a decision from a typical cranky cult-of-personality open source project, they have decided that proper Web apps need to be asynchronous and message-driven, and by God that’s what you’re going to do. Their load balancers won’t do sticky sessions (only round robin) and time out all connections between all tiers after 60 seconds without exception. If you need more than that, tough – rewrite your app to use a multi-tier message queue/event listener model. Now on the one hand, it’s hard for me to disagree with that – I’ve been sweating our developers, telling them that’s the correct best-practice model for scalability on the Web. But again you’re faced with the “Well what if I’m using some preexisting software and that’s not how it’s architected?” problem. This is the typical PaaS pattern of “it’s great, if you’re writing every line of code yourself.”

In many ways, Azure is meant to be very developer friendly. In a lot of ways that’s good. As a system admin, however, I wince every time they go on about “You can deploy your app to Azure just by right clicking in Visual Studio!!!” Of course, that’s not how anyone with a responsibly controlled production environment would do it, but it certainly does make for fast easy adoption in development. The curve for a developer who is “just” a C++/Java/.NET/whatever wrangler to get up and going on an IaaS solution like Amazon is pretty large comparatively; here, it’s “go sign up for an account and then click to deploy from your IDE, and voila it’s running on the Intertubes.” So it’s a qualified good – it puts more pressure on you as an ops person to go get the developers to understand why they need to utilize your services. (In a traditional server environment, they have to go through you to get their code deployed.) Often, for good or ill, we use the release process as a touchstone to also engage developers on other aspects of their code that need to be systems engineered better.

Now, that’s my view of the major differences. I think the usual Azure sales pitch would say something different – I’ve forgotten two of their huge differentiators, their service bus and access control components. They are branded under the name “AppFabric,” which as usual is a name Microsoft is also using for something else completely different (a new true app server for Windows Server, including projects formerly code named Dublin and Velocity – think of it as a real WebLogic/WebSphere type app server plus memcache.)

Their service bus is an ESB. As alluded to above, you’re going to want to use it to do messaging. You can also use Azure Queues, which is a little confusing because the ESB is also a message queue – I’m not clear on their intended differentiation really. You can of course just load up an ESB yourself in any other IaaS cloud solution too, so if you really want one you could do e.g. Apache ServiceMix hosted on Amazon. But, they are managing this one for you which is a plus. You will need to use it to do many of the common things you’d want to do.

Their access control – is a mess. Sorry, Microsoft guys. The whole rest of the thing, I’ve managed to cut through the “Microsoft acronyms versus the rest of the world’s terms and definitions” factor, but not here. “You see, you use ACS’s WIF STS to generate a SWT,” says our Microsoft rep with a straight face. They seem to be excited that it will use people’s Microsoft Live IDs, so if you want people to have logins to your site and you don’t want to manage any of that, it is probably nice. It takes SAML tokens too, I think, though I’m not sure if the caveats around that end up equating to “Well, not really.” Anyway, their explanations have been incoherent so far and I’m not smelling anything I’m really interested in behind it. But there’s nothing to prevent you from just using LDAP and your own Internet SSO/federation solution. I don’t count this against Microsoft because no one else provides anything like this, so even if I ignore the Azure one it doesn’t put it behind any other solution.

The Future

Microsoft has said they plan to add on some kind of VM/IaaS offering eventually because of the demand. For us, the PaaS approach is a bit of a drawback – we want to do all kinds of things like “virus scan uploaded files,” “run a good load balancer,” “run an LDAP server”, and other things that basically require more full OS access. I think we may have an LDAP direction with the all-Java OpenDS, but it’s a pain point in general.

I think a lot of their decisions that are a short term pain in the ass (no installs, no synchronous) are actually good in the long term. If all developers knew how to develop async and did it by default, and if all software vendors, even Windows based ones, provided their product in a form that could just be “copy and run without admin privs” to install, the world would be a better place. That’s interesting in that “Sure it’s hard to use now but it’ll make the world better eventually” is usually heard from the other side of the aisle.

Conclusion

Azure’s a pretty legit offering! And I’m very impressed by their velocity. I think it’s fair to say that overall Azure isn’t quite as good as Amazon except for specific use cases (you’re writing it all in .NET by hand in Visual Studio) – but no one else is as good as Amazon either (believe me, I evaluated them) and Amazon has years of head start; Azure is brand new but already at about 80%! That puts them into the top 5 out of the gate.

Without an IaaS component, you still can’t do everything under the sun in Azure. But if you’re not depending on much in the way of big third party software chunks, it’s feasible; if you’re doing .NET programming, it’s very compelling.

Do note that I haven’t focused too much on the attributes and limitations of cloud computing in general here – that’s another topic – this article is meant to compare and contrast Azure to other cloud offerings so that people can understand its architecture.

I hope that was clear. Feel free and ask questions in the comments and I’ll try to clarify!

OpsCamp Debrief

I went to OpsCamp this last weekend here in Austin, a get-together for Web operations folks specifically focusing on the cloud, and it was a great time! Here’s my after action report.

The event invite said it was in the Spider House, a cool local coffee bar/normal bar. I hadn’t been there before, but other people that had said “That’s insane! They’ll never fit that many people! There’s outside seating but it’s freezing out!” That gave me some degree of trepidation, but I still racked out in time to get downtown by 8 AM on a Saturday (sigh!). Happily, it turned out that the event was really in the adjacent music/whatnot venue also owned by Spider House, the United States Art Authority, which they kindly allowed us to use for free! There were a lot of people there; we weren’t overfilling the place but it was definitely at capacity, there were near 100 people in attendance.

I had just heard of OpsCamp through word of mouth, and figured it was just going to be a gathering of local Austin Web ops types. Which would be entertaining enough, certainly. But as I looked around the room I started recognizing a lot of guys from Velocity and other major shows; CEOs and other high ranked guys from various Web ops related tool companies. Sponsors included John Willis and Adam Jacob (creator of Chef) from Opscode , Luke Kanies from Reductive Labs (creator of Puppet), Damon Edwards and Alex Honor from DTO Solutions (formerly ControlTier), Mark Hinkle and Matt Ray from Zenoss, Dave Nielsen (CloudCamp), Michael Coté (Redmonk), Bitnami, Spiceworks, and Rackspace Cloud. Other than that, there were a lot of random Austinites and some guys from big local outfits (Dell, IBM).

You can read all the tweets about the event if you swing that way.

OpsCamp kinda grew out of an earlier thing, BarCampESM, also in Austin two years ago. I never heard about that, wish I had.

How It Went

I had never been to an “unconference” before. Basically there’s no set agenda, it’s self-emergent. It worked pretty well. I’ll describe the process a bit for other noobs.

First, there was a round of lightning talks. Brett from Rackspace noted that “size matters,” Bill from Zenoss said “monitoring is important,” and Luke from Reductive claimed that “in 2-4 years ‘cloud’ won’t be a big deal, it’ll just be how people are doing things – unless you’re a jackass.”

Then it was time for sessions. People got up and wrote a proposed session name on a piece of paper and then went in front of the group and pitched it, a hand-count of “how many people find this interesting” was taken.

Candidates included:

service level to resolution
physical access to your cloud assets
autodiscovery of systems
decompose monitoring into tool chain
tool chain for automatic provisioning
monitoring from the cloud
monitoring in the cloud – widely dispersed components
agent based monitoring evolution
devops is the debil – change to the role of sysadmins
And more

We decided that so many of these touched on two major topics that we should do group discussions on them before going to sessions. They were:

monitoring in the cloud
config mgmt in the cloud

This seemed like a good idea; these are indeed the two major areas of concern when trying to move to the cloud.

Sadly, the whole-group discussions, especially the monitoring one, were unfruitful. For a long ass time people threw out brilliant quips about “Why would you bother monitoring a server anyway” and other such high-theory wonkery. I got zero value out of these, which was sad because the topics were crucially interesting – just too unfocused; you had people coming at the problem 100 different ways in sound bytes. The only note I bothered to write down was that “monitoring porn” (too many metrics) makes it hard to do correlation. We had that problem here, and invested in a (horrors) non open-source tool, Opnet Panorama, that has an advanced analytics and correlation engine that can make some sense of tens of thousands of metrics for exactly that reason.

Sessions

There were three sessions. I didn’t take many notes in the first one because, being a Web ops guy, I was having to work a release simultaneously with attending OpsCamp 😛

Continue reading →

10 Comments

Filed under DevOps, Uncategorized

Tagged as chef, Cloud, Cloud Computing, controltier, DevOps, opscamp, puppet, systems, unconference, web admin, web ops, zenoss

by Ernest Mueller | August 5, 2008 · 2:21 pm

Cloud Headaches?

The industry is abuzz with people who are freaked out about the outages that Amazon and other cloud vendors have had. “Amazon S3 Crash Raises Doubts Among Cloud Customers,” says InformationWeek!

This is because people are going into cloud computing with retardedly high expectations. This year at Velocity, Interop, etc. I’ve seen people just totally in love with cloud computing – Amazon’s specifically but in general as well. And it’s a good concept for certain applications. However, it is a computing system just like every other computing system devised previously by man. And it has, and will have, problems.

Whether you are using in house systems, or a SaaS vendor, or building “in the cloud,” you have the same general concerns. Am I monitoring my systems? What is my SLA? What is my recourse if my system is not hitting it? What’s my DR plan?

SaaS is a special case of cloud computing in general. And if you’re a company relying on it, when you contract with a SaaS vendor you get SLAs established and figure out what the remedy is if they breach it. If you are going into a relationship where you are just paying money for a cloud VM, storage, etc. and there is no enforceable SLA in the relationship, then you need to build the risk of likely and unremediable outages into your business plan.

I hate to break it to you, but the IT people working at Amazon, Google, etc. are not all that smarter than the IT people working with you. So an unjustified faith in a SaaS or cloud vendor – “Oh, it’s Amazon, I’m sure they’ll never have an outage of any sort – either across their entire system or localized to my part of it – and if they do I’m sure the $100/month I’m paying them will cause them to give a damn about me” – is an unreasonable expectation on its face.

Clouds and cloud vendors are a good innovation. But they’re like every other computing innovation and vendor selling it to you. They’ll have bugs and failures. But treating them as if they won’t is a failure on your part, not theirs.

2 Comments

Filed under Cloud, Uncategorized

Tagged as Cloud, Cloud Computing, paas, reliability, SaaS, SLA

Subscribe

Recent Comments

Recent Posts

Austinites

Cloud

DevOps

Archives