Author Archives: Ernest Mueller

Ernest Mueller's avatar

About Ernest Mueller

Ernest is the VP of Engineering at the cloud and DevOps consulting firm Nextira in Austin, TX. More...

Security and the Rise (and Fall?) of DevOps

As I’ve been involved with DevOps and its approach of blending development and operations staff together to create better products, I’ve started to see similar trends develop in the security space. I think there’s some informative parallels where both can learn from each other and perhaps avoid some pitfalls.

Here’s a recent article entitled “Agile: Most security guys are useless” that states the problem succinctly. In successful and agile orgs, the predominant mindset is that if you’re not touching the product, you are semi-useless overhead. And there’s some truth to that. When people are segregated into other “service” orgs – like operations or security – the us vs. them mindset predominates and strangles innovation in its crib.

The main initial drive of agile was to break down that wall between the devs and the “business”, but walls remain that need similar breaking down. With DevOps, operations organizations faced with this same problem are innovating new approaches; a collaborative approach with developers and operations staff working together on the product as part of the same team. It’s working great for those who are trying it, from the big Web shops like Facebook to the enterprise guys like us here at NI. The movement is gathering steam and it seems clear to those of us doing it this way that it’s going to be a successful and disruptive pattern for adopters.

But let’s not pat ourselves on the back too much just yet. We still have a lot of opportunity to screw it up. Let’s review an example from another area.

In the security world, there is a whole organization, OWASP (the Open Web Application Security Project) whose goal is to promote and enable application security. Security people and developers, working together!  Dev+Sec already exists! Or so the plan was.

However, recently there have been some “shots across the bow” in the OWASP community.  Read Security People vs Developers and especially OWASP: Has It Reached A Tipping Point? The latter is by Mark Curphey, who started OWASP. He basically says OWASP is becoming irrelevant because it’s leaving developers behind. It’s becoming about “security professionals” selling tools and there’s few developers to be found in the community any more.

And this is absolutely true.  We host the Austin OWASP chapter here at NI’s Austin campus, and two of the officers are NI employees. We make sure and invite NI developers to come to OWASP. Few do, at least not after the first couple times.  I asked some of the devs on our team why not, and here’s some answers I got.

  • I want to leave sessions by saying, “I need to think about this the next time I code”. I leave sessions by saying, “that was cool, I can talk about this at a happy hour”. If I could do the former, I’d probably attend most/all the sessions.
  • A lot of the sessions don’t seem business focused and it is hard to relate. Demos are nicer; but a lot of times they are so specific (for example: specific OS + specific Java Version + specific javascript library = hackable) that it’s not actionable.
  • “Security people” think “developers” don’t know what they are doing and don’t care about security. Which to developers is offensive. We like to write secure applications; sometimes we just find the bugs too late….
  • I’ve gone to, I think, 4 OWASP meetings.  Of those, I probably would only have recommended one of them to others – Michael Howard’s.  I think it helped that he was a well-known speaker and seemed to have a developer focus. So, well-known speakers, with a compelling and relevant subject..  Even then, the time has to be weighed against other priorities.  For example, today’s meeting sounds interesting, but not particularly relevant.  I’ll probably skip it.

In the end, the content at these meetings is more for security pros like pen testers, or for tool buyers in security or sysadmin groups. “How do I code more securely” is the alleged point of the group but frankly 90% of the activity is around scanners and crackers and all kinds of stuff that is fine but should be simple testing steps after the code’s written securely in the first place.

As a result there have been interesting ideas coming from the security community that are reminiscent of DevOps concepts. Pen tester @atdre did a talk here to the Austin OWASP chapter about how security testers engaging with agile teams “from the outside” are failing, and shouldn’t we instead embed them on the team as their “security buddy.” (I love that term.  Security buddy. I hate my “compliance auditor” without even meeting the poor bastard, but I like my security buddy already.) At the OWASP convention LASCON, Matt Tesauro delivered a great keynote similarly trying to refocus the group back on the core problem of developing secure software; in fact, they’re co-sponsoring a movement called “Rugged” that has a manifesto similar to the Agile Manifesto but is focused on security, availability, reliability, et cetera. (As a result it’s of interest to us sysadmin types, who are often saddled with somehow applying those attributes in production to someone else’s code…)

The DevOps community is already running the risk of “leaving the devs behind” too.  I love all my buddies at Opscode and DTO and Puppet Labs and Thoughtworks and all. But a lot of DevOps discussions have started to be completely sysadmin focused as well; a litany of tools you can use for provisioning or monitoring or CI. And that wouldn’t be so bad if there was a real entry point for developers – “Here’s how you as a developer interact with chef to deploy your code,” “Here’s how you make your code monitorable”. But those are often fringe discussions around the core content which often mainly warms the cockles of a UNIX sysadmin’s heart. Why do any of my devs want to see a presentation on how to install Puppet?  Well, that’s what they got at a recent Austin Cloud User Group meeting.

As a result, my devs have stopped coming to DevOps events.  When I ask them why, I get answers similar to the ones above for why they’re not attending OWASP events any more. They’re just not hearing anything that is actionable from the developer point of view. It’s not worth the two hours of their valuable time to come to something that’s not at all targeted at them.

And that’s eventually going to scuttle DevOps if we let it happen, just as it’ll scuttle OWASP if it continues there. The core value of agile is PEOPLE over processes and tools, COLLABORATION over negotiation. If you are leaving the collaboration behind and just focusing on tools, you will eventually fail, just in a more spectacular and automated fashion.

The focus at DevOpsDays US 2010 was great, it was all about culture, nothing about tools. But that culture talk hasn’t driven down to anything more actionable, so tools are just rising up to fill the gap.

In my talk at that DevOpsDays I likened these new tools and techniques to the introduction of the Minie ball to rifles during the Civil War. In that war, they adopted new tools and then retained their same old tactics, walking up close in lines designed for weapons with much shorter ranges and much lower accuracy – and the slaughter was profound.

All our new DevOps tools are great, but in the same way, if we don’t adapt our way of thinking to them, they will make our lives worse, not better, for all their vaunted efficiency. You can do the wrong thing en masse and more quickly. The slaughter will similarly be profound.

A sysadmin suddenly deciding to code his own tools isn’t really the heart of DevOps.  It’s fine and good, and I like seeing more tools created by domain experts. But the heart of DevOps, where you will really see the benefits in hard ROI, is developers and operations folks collaborating on real end-consumer products.

If you are doing anything DevOpsey, please think about “Why would a developer care about this?” How is it actionable to them, how does it make their lives easier?  I’m a sysadmin primarily, so I love stuff that makes my job easier, but I’ve learned over the years that when I can get our devs to leverage something, that’s when it really takes off and gives value.

The same thing applies to the people on the security side.  Why do we have this huge set of tools and techniques, of OWASP Top 10s and Live CDs and Metasploits and about a thousand wonderful little gadgets, but code is pretty much as shitty and insecure as it was 20 years ago? Because all those things try to solve the problem from the outside, instead of targeting the core of the matter, which is developers developing secure code in the first place.  And to do that, it’s more of a hearts-and-minds problem than a tools-and-processes problem.

That’s a core realization that operations folks, and security folks, and testing folks, and probably a bunch of other folks need to realize, deeply internalize, and let it change the way they look at the world and how they conduct their work.

5 Comments

Filed under DevOps, Security

Why Amazon Reserve Instances Torment Me

We’ve been using over 100 Amazon EC2 instances for a year now, but I’ve just now made my first reserve instance purchase. For the untutored, reserve instances are where you pay a yearly upfront per instance and you get a much, much lower hourly cost. On its face, it’s a good deal – take a normal Large instance you’d use for a database.  For a Linux one, it’s $0.34 per hour.  Or you can pay $910 up front for the year, and then it’s only $0.12 per hour. So theoretically, it takes your yearly cost from $2978.40 to $1961.2.  A great deal right?

Well, not so much. The devil is in the details.

First of all, you have to make sure and be running all those instances all the time.  If you buy a reserve instance and then don’t use it some of the time, you immediately start cutting into your savings.  The crossover is at 172 days – if you don’t run the instance at least 172 days out of the year then you are going upside down on the deal.

But what’s the big deal, you ask?  Sure, in the cloud you are probably (and should be!) scaling up and down all the time, but as long as you reserve up to your low water mark it should work out, right?

So the big second problem is that when you reserve instances, you have to specify everything about that instance.  You aren’t reserving “10 instances”, or even “10 large instances” – you have to specify:

  • Platform (UNIX/Linux, UNIX/Linux VPC, SUSE Linux, Windows, Windows VPC, or Windows with SQL Server)
  • Instance Type (m1.small, etc.)
  • AZ (e.g. us-east-1b)

And tenancy and term. So you have to reserve “a small multitenant Linux instance in us-east-1b for one year.” But having to specify down to this level is really problematic in any kind of dynamic environment.

Let’s say you buy 10 m1.large instances for your databases, and then you realize later you really need to move up to an m1.xlarge.  Well, tough. You can, but if you don’t have 10 other things to run on those larges, you lose money. Or if you decide to change OS.  One of our biggest expenditures is our compile farm workers, and on those we hope to move from Windows to Linux once we get the software issues worked out, and we’re experimenting with best cost/performance on different instance sizes. I’m effectively blocked from buying reserve for those, since if I do it’ll put a stop to our ability to innovate.

And more subtly, let’s say you’re doing dynamic scaling and splitting across AZs like they always say you should do for availability purposes.  Well, if I’m running 20 instances, and scaling them across 1b and 1c, I am not guaranteed I’m always running 10 in 1b and 10 in 1c, it’s more random than that.  Instead of buying 20 reserve, you instead have to buy say 7 in 1b and 7 in 1c, to make sure you don’t end up losing money.

Heck, they even differentiate between Linux and Suse and Linux VPC instances, which clearly crosses over into annoyingly picky territory.

As a result of all this, it is pretty undesirable to buy reserve instances unless you have a very stable environment, both technically and scale-wise. That sentence doesn’t describe the typical cloud use case in my opinion.

I understand, obviously, why they are doing this.  From a capacity planning standpoint, it’s best for them if they make you specify everything. But what I don’t think they understand is that this cuts into people willing to buy reserve, and reserve is not only upfront money but also a lockin period, which should be grotesquely attractive to a business. I put off buying reserve for a year because of this, and even now that I’ve done it I’m not buying near as many reserve as I could be because I have to hedge my bets against ANY changes to my service. It seems to me that this also degrades the alleged point of reserves, which is capacity planning – if you’re so picky about it that no one buys reserve and 95% of your instances are on demand, then you can’t plan real well can you?

What Amazon needs to do is meet customers halfway.  It’s all a probabilities game anyway. They lose specificity of each given reserve request, but get many more reserve requests (and all the benefits they convey – money, lockin, capacity planning info) in return.

Let’s look at each axis of inflexibility and analyze it.

  • Size.  Sure, they have to allocate machines, right?  But I assume they understand they are using this thing called “virtualization.”  If I want to trade in 20 reserved small instances for 5 large instances (each large is 4x a small), why not?  It loses them nothing to allow this. They just have to make the effort to allow it to happen in their console/APIS. I can understand needing to reserve a certain number of “units” but those should be flexible on exact instance types at a given time.
  • OS. Why on God’s green earth do I need to specify OS?  Again, virtualized right? Is it so they can buy enough Windows licenses from Microsoft?  Cry me a river.  This one needs to leave immediately and never come back.
  • AZ. This is annoying from the user POV but probably the most necessary from the Amazon POV because they have to put enough hardware in each data center, right?  I do think they should try to make this a per region and not a per AZ limit, so I’m just reserving “us-east” in Virginia and not the specific AZ, that would accommodate all my use cases.

In the end, one of the reasons people move to the cloud in the first place is to get rid of the constraints of hardware.  When Amazon just puts those constraints back in place, it becomes undesirable. Frankly even now, I tried to just pay Amazon up front rather than actually buy reserve, but they’re not really enterprise friendly yet from a finance point of view so I couldn’t make that happen, so in the end I reluctantly bought reserve.  The analytics around it are lacking too – I can’t look in the Amazon console and see “Yes, you’re using all 9 of your large/linux/us-east-1b instances.”

Amazon keeps innovating greatly in the technology space but in terms of the customer interaction space, they need a lot of help – they’re not the only game in town, and people with less technically sophisticated but more aggressive customer services/support options will erode their market share. I can’t help that Rackspace’s play with backing OpenStack seems to be the purest example of this – “Anyone can run the same cloud we do, but we are selling our ‘fanatical support'” is the message.

8 Comments

Filed under Cloud

DevOps In Action Podcast

While at SXSW, I recorded an episode of the IT Management and Cloud Podcast with Michael Coté and John Willis. You can find it on Coté’s People Over Process blog! Sorry about the background noise, we were doing it in the upstairs bar at the Driskill.

Leave a comment

Filed under Cloud, Conferences, DevOps

SXSW Interactive 2011 Day Two

Day 2 started off a bit rocky but then got really good.  I tried to get to the Lean Startup: King of the Apps Showdown session, but the shuttle buses take a long route and are few and far between, so I got there pretty late.  I saw snapappointments.com, snapwebsites, Planzai, Icebreakr,and mapomatic all get their app and pitch critiqued by a panel of folks – Robert Scoble, Eric Ries, Dave McClure, Bill Boebel, and Stacey Higginbotham. They demanded mapomatic release right away, and told snapwebsites to remove half of his features.  I was ambivalent about this advice – I can see streamlining the UI to hide advanced functionality, but “remove functionality”? Sure, a simple app has a wider reach, but if you can effectively provide an advanced mode/version with more functionality that professionals need, wouldn’t that be better? I just don’t like how all these apps are degenerating into one very narrow function “plus it checks you into Foursquare!!!!”  Bah.

They joked about how many of the candidates had “Snap” in their name, and then on the bus on the way back I met a girl from TeamSnap. They have a site that lets you arrange sports teams, even take Paypal payments for team dues, very slick.

And congratulations to friend of the blog Lenny Rachitsky (@lennysan) of Localmind, who has been singled out by Scoble as “best SXSW app so far!” You should all go download it from the App Store now.

Then I tried to go to the session on How To Innovate At Big Companies, with Gene Kim of Visible Ops fame – I left the AT&T center early and went all the way to the Hyatt. But it was so full they had 30 people standing outside only allowed in if anyone left. So I went over to the Hilton to try to catch Mistakes I Made Building Netflix for the iPhone. And it also had a line of “one in, one out” standby people. Son of a bitch.  They better be videoing this stuff. Anyway, at this point it was too far into the slot so I just went to the Screenburn Arcade, which was fun. Got my picture taken with Loren Wiseman of Traveller fame – apparently the venerable RPG is being turned into an iPhone MMO!

To recuperate from our disappointing morning, Peco and I went and had a burger at Casino El Camino, which takes a ridiculously long time but is worth it, they have the best burgers in Austin. We hooked up with Cote and John Willis there, and then went to the Etsy Code as Craft: Moving Fast At Scale event which was brilliant and DevOpsey. Continuous deployment, only coding off trunk, logging and metrics, “dashboard driven development”, and more. A lot of the topics have info on them on the Etsy Code as Craft blog – they didn’t record the talk but promise to record it the next time they give it and put it up.

I took a lot of notes but the high points were:

  • They have open sourced their log collection/graphing tool, Logster, and their stats collection daemon, statsd.
  • They perform hundreds of releases a month but only had six “bad deploys” over the course of a year by combining one-button deploy, testing, feature flags, and a  theory of roll-forward. No more organized releases, release managers, rollbacks, all that stuff.
  • They have a cute “deployinator” dashboard that lets people (and not just coders, they have more people deploying than they have developers) do the deploys – they’ll open source it, but stressed that it’s not much technically, the culture and practices are 99% of the work to get to this.
  • They use a lot of chef but don’t use it for code deploys because that’s a simple task that they need more control over.

That was a two hour event, so then after a little VIP partyin’ atop Fogo de Chao, we packed it in for the day.

Leave a comment

Filed under Conferences

DevOps at CloudCamp at SXSWi!

Isn’t that some 3l33t jargon.  Anyway, Dave Nielsen is holding a CloudCamp here in Austin during SXSW Interactive on Monday, March 14 (followed by the Cloudy Awards on March 15) and John Willis of Opscode and I reached out to him, and he has generously offered to let us have a DevOps meetup in conjunction.

John’s putting together the details but is also traveling, so this is going to be kind of emergent.  If you’re in Austin (normally or just for SXSW) and are interested in cloud, DevOps, etc., come on out to the CloudCamp (it’s free and no SXSW badge required) and also participate in the DevOps meetup!

In fact, if anyone’s interested in doing short presentations or show-and-tells or whatnot, please ping John (@botchagalupe) or me (@ernestmueller)!

Leave a comment

Filed under Cloud, Conferences, DevOps

SXSW Interactive 2011 Day One

We started out the week gently, with a light Friday (why am I still so tired then?).  Two of your favorite Agile Admins, Peco and Ernest, were down at the Austin Convention Center to experience SXSW Interactive!  The third Agile Admin, James, was busy giving a talk at the nearby Security B-Sides Austin.

After getting down there early, getting badges, and getting oriented, we saw Jason Calacanis interview Tim O’Reilly. He gave a lot of interesting insight into the development of innovation, tracing the Internet, open source, Web 2.0/social media/user contributed content, cloud, and Big Data.

The single most interesting thing he said, though, was a side note that illustrated how hard it is for companies to maintain a real vision over time, especially once they get big and various stakeholders’ needs conflict – he talked about how many people have become billionaires off of O’Reilly ideas and how there’s been pressure for him to sell out or cash in – like Cisco offered to buy them, noting that “You guys are always first on the scene with the cool stuff but you fail to exploit it.” Tim always rejected those offers, though he did note he is often conflicted because of all the people he has working for them – and not making all those great people the money that would bring.

Next, Peco saw Google’s Marissa Meyer talk (he’ll have to share what went on there) and I kinda played hooky by going to a session about “daddy bloggers,” as that’s a personal interest of mine.

Then we both went to a session that was supposed to be about “The Connected Car: Driving Technology” and automobile telemetrics, but it sucked. It was some car guys and a lady from Pandora telling us over and over again that “cars hooked up to networks and stuff are cool.” I am willing to put up with about 5 minutes of telling me something’s cool, but if it doesn’t give way quickly to you SHOWING me that it’s cool, I’m out.  We bailed 30 minutes in, along with a lot of the other attendees.

Then we went down to the Austin Music Hall for four hours of Ignite, a format where presenters get 5 minutes and auto-advancing slides to make a point. The topic was “2021 Visions of the Future.”  Besides the talks and some bands, there were also a bunch of Arduino and robotics and various weird Maker type booths, which was fun. The crowd was really varied, in fact there were other people there from NI who had been invited via various completely different vectors (UT School of Engineering, Austin Ventures startup stuff, SXSWi). [Side note – there was a contest to drop an egg from the balcony safely using only 4 sheets of paper and a couple feet of tape, and I was one of the winners and got a Roku for my troubles! Engineering education FTW.]

Some presentations were good, others incoherent, but a common theme throughout the day was people doing things they have a passion for, and worrying about the money later.  This was a theme in Tim O’Reilly’s talk and it pervaded the presentations at Ignite. Really most people aren’t in the field they’re in because it’s the thing with the highest ROI they qualify for, they are in it because they  have some kind of passion for it.  A lot of life then tries to stomp that passion out, but the enabling factors of the Web, DIY, etc. are making it so you can flip off  “the Man” and pursue your passion yourself if you want to. As a large company we struggle with that, and try to promote people following their passions and enable internal entrepreneurship at a high level, while however the natural grinding wheels of a large organization grind that into meal at the middle levels.

Then we got free ice cream from the Free Ice Cream Man and headed home. I’m getting sick and feel whupped, but it was energizing to see so many people doing so many great things – and let me tell you, SXSWi is getting HUGE.  It’s nothing like the first year I went, when there were maybe ten sessions and the conference center was largely empty. The place was packed; companies have bought out and transformed nearby buildings – a bar across the street is now the Playstation Lounge, with huge video wall on the roof and stuff; CNN had a giant neon sign installed at one restaurant… I can tell the economy’s looking up and that Interactive is hot because Lordy there’s money getting thrown at this thing.

See the rest of my pictures from Day One of SXSW here!

Leave a comment

Filed under Conferences

SXSW Tips If You’re From Out Of Town

I just went to lunch with a visitor in town for SXSW (@lennysan from Localmind) and it reminded me of some of the ‘gotchas’ that someone not from Austin may not know.

  • If your hotel doesn’t have a shuttle, and it is not immediately inside the downtown area, you will need a car. (Unless you are real close to the one light rail line, and I wouldn’t bet on it not being totally overloaded). Austin is big and not pedestrian friendly outside of IMMEDIATELY adjacent to the river.
  • Here’s an awesome downtown parking guide from Community Impact. Shows where they are, what the rates are, whether they’re lot or structure.
  • You can rent your gun at the airport when you land, to avoid spending time finding another place to do it.
  • Just because a place has a big “BBQ” sign doesn’t mean you should go eat barbecue there.  We were eating real BBQ at Rudy’s and looking across the highway at the Bone Daddy’s that has a big ol’ sign on it saying BBQ.  I’m not saying Bone Daddy’s doesn’t have things to recommend it, but BBQ isn’t really its strength.  If you’re downtown, eat at the Iron Works, and if anyone is organizing a field trip out of town to the really golden places in Taylor or Llano, go along.
  • Someone asked on Localmind about breakfast during SXSW – here, breakfast is breakfast tacos, except for hotel restaurants and Denny’s.

Any more tips for furriners?  Post them here!

1 Comment

Filed under Conferences

SXSW Interactive Is Here!

SXSW Interactive is going on in Austin tomorrow through next Tuesday, and loads of great cloud and DevOps folks will be in town for it. Looking forward to talking with @cote, @lennysan, @botchgalupe, @davenielsen, @ehuddleston, and many more.

Here’s what I think the best cloud/devops/high tech related tickets are, let me know what I’m missing! A lot of the off premise events don’t even require badges and are mostly free.

Random Off Premise SXSW Interactive Stuff

Other events – not job related but make me happy:

Sessions

Sessions are less important than the other stuff so they’re on here second!  No time for links, search on ’em.

What’s the good stuff I haven’t mentioned?  DevOps, Cloud, noSQL, and other cool stuff report below!

Leave a comment

Filed under Conferences

Amazon CloudFormation: Model Driven Automation For The Cloud

You may have heard about Amazon’s newest offering they announced today, CloudFormation.  It’s the new hotness, but I see a lot of confusion in the Twitterverse about what it is and how it fits into the landscape of IaaS/PaaS/Elastic Beanstalk/etc. Read what Werner Vogels says about CloudFormation and its uses first, but then come back here!

Allow me to break it down for you and explain why this is such a huge leverage point for cloud developers.

What Has Come Before

Up till now on Amazon you could configure up a single virtual image the way you wanted it, with an AMI. You could even kind of construct a scalable tier of similar systems using Auto Scaling, by defining Launch Configurations. But if you wanted to construct an entire multitier system it was a lot harder.  There are automated configuration management tools like chef and puppet out there, but their recipes/models tend to be oriented around getting a software loadout on an existing system, not the actual system provisioning – in general they come from the older assumption you have someone doing that on probably-physical systems using bcfg2 or cobber or vagrant or something.

So what were you to do if you wanted to bring up a simple three tier system, with a Web tier, app server tier, and database tier?  Either you had to set them up and start them manually, or you had to write code against the Amazon APIs to explicitly pull up what you wanted. Or you had to use a third party provisioning provider like RightScale or EngineYard that would let you define that kind of model in their Web consoles but not construct your own model programmatically and upload it. (I’d like my product functionality in my own source control and not your GUI, thanks.)

Now, recently Amazon launched Elastic Beanstalk, which is more way over on the PaaS side of things, similar to Google App Engine.  “Just upload your application and we’ll run it and scale it, you don’t have to worry about the plumbing.” Of course this sharply limits what you can do, and doesn’t address the question of “what if my overall system consists of more than just one Java app running in Beanstalk?”

If your goal is full model driven automation to achieve “infrastructure as code,” none of these solutions are entirely satisfactory. I understand CloudFormation deeply because we went down that same path and developed our own system model ourselves as a response!

I’ll also note that this is very similar to what Microsoft Azure does.  Azure is a hybrid IaaS/PaaS solution – their marketing tries to say it’s more like Beanstalk or Google App Engine, but in reality it’s more like CloudFormation – you have an XML file that describes the different roles (tiers) in the system, defines what software should go on each, and lets you control the entire system as a unit.

So What Is CloudFormation?

Basically CloudFormation lets you model your Amazon cloud-based system in JSON and then provision and control it as a unit.  So in our use case of a three tier system, you would model it up in their JSON markup and then CloudFormation would understand that the whole thing is a unit.  See their sample template for a WordPress setup. (A mess more sample templates are here.)

Review the WordPress template; it lets you define the AMIs and instance types, what the security group and ELB setups should be, the RDS database back end, and feed in variables that’ll be used in the consuming software (like WordPress username/password in this case).

Once you have your template you can tell Amazon to start your “stack” in the console! It’ll even let you hook it up to a SNS notification that’ll let you know when it’s done. You name the whole stack, so you can distinguish between your “dev” environment and your “prod” environment for example, as opposed to the current state of the Amazon EC2 console where you get to see a big list of instance IDs – they added tagging that you can try to use for this, but it’s kinda wonky.

Why Do I Want This Again?

Because a system model lets you do a number of clever automation things.

Standard Definition

If you’ve been doing Amazon yourself, you’re used to there being a lot of stuff you have to do manually.  From system build to system build even you do it differently each time, and God forbid you have multiple techies working on the same Amazon system. The basic value proposition of “don’t do things manually” is huge.  You configure the security groups ONCE and put it into the template, and then you’re not going to forget to open port 23 AGAIN next time you start a system. A core part of what DevOps is realizing as its value proposition is treating system configuration as code that you can source control, fix bugs in and have them stay fixed, etc.

And if you’ve been trying to automate your infrastructure with tools like Chef, Puppet, and ControlTier, you may have been frustrated in that they address single systems well, but do not really model “systems of systems” worth a damn.  Via new cloud support in knife and stuff you can execute raw “start me a cloud server” commands but all that nice recipe stuff stops at the box level and doesn’t really extend up to provisioning and tracking parts of your system.

With the CloudFormation template, you have an actual asset that defines your overall system.  This definition:

  • Can be controlled in source control
  • Can be reviewed by others
  • Is authoritative, not documentation that could differ from the reality
  • Can be automatically parsed/generated by your own tools (this is huge)

It’s also nicely transparent; when you go to the console and look at the stack it shows you the history of events, the template used to start it, the startup parameters it used… Moving away from the “mystery meat” style of system config.

Coordinated Control

With CloudFormation, you can start and stop an entire environment with one operation. You can say “this is the dev environment” and be able to control it as a unit. I assume at some point you’ll be able to visualize it as a unit, right now all the bits are still stashed in their own tabs (and I notice they don’t make any default use of their own tagging, which makes it annoying to pick out what parts are from that stack).

This is handy for not missing stuff on startup and teardown… A couple weeks ago I spent an hour deleting a couple hundred rogue EBSes we had left over after a load test.

And you get some status eventing – one of the most painful parts of trying to automate against Amazon is the whole “I started an instance, I guess I’ll sit around and poll and try to figure out when the damn thing has come up right.”  In CloudFront you get events that tell you when each part and then the whole are up and ready for use.

What It Doesn’t Do

It’s not a config management tool like Chef or Puppet. Except for what you bake onto your AMI it has zero software config capabilities, or command dispatch capabilities like Rundeck or mcollective or Fabric. Although it should be a good integration point with those tools.

It’s not a PaaS solution like Beanstalk or GAE; you use those when you just have an app you want to deploy to something that’ll run it.  Now, it does erode some use cases – it makes a middle point between “run it all yourself and love the complexity” and “forget configurable system bits, just use PaaS.”  It allows easy reusability, say having a systems guy develop the template and then a dev use it over and over again to host their app, but with more customization than the pure-play PaaSes provide.

It’s not quite like OVF, which is more fiddly and about virtually defining the guts of a single machine than defining a set of systems with roles and connections.

Competitive Analysis

It’s very similar to Microsoft Azure’s approach with their .cscfg and .csdef files which are an analogous XML model – you really could fairly call this feature “Amazon implements Azure on Amazon” (just as you could fairly call Elastic Beanstalk “Amazon implements Google App Engine on Amazon”.) In fact, the Azure Fabric has a lot more functionality than the primitive Amazon events in this first release. Of course, CloudFormation doesn’t just work on Windows, so that’s a pretty good width vs depth tradeoff.

And it’s similar to something like a RightScale, and ideally will encourage them to let customers actually submit their own definition instead of the current clunky combo of ServerArrays and ServerTemplates (curl or Web console?  Really? Why not a model like this?). RightScale must be in a tizzy right now, though really just integrating with this model should be easy enough.

Where To From Here?

As I alluded, we actually wrote our own tool like this internally called PIE that we’re looking to open source because we were feeling this whole problem space keenly.  XML model of the whole system, Apache Zookeeper-based registry, kinda like CloudFormation and Azure. Does CloudFormation obsolete what we were doing?  No – we built it because we wanted a model that could describe cloud systems on multiple clouds and even on premise systems. The Amazon model will only help you define Amazon bits, but if you are running cross-cloud or hybrid it is of limited value. And I’m sure model visualization tools will come, and a better registry/eventing system will come, but we’re way farther down that path at least at the moment. Also, the differentiation between “provisioning tools” that control and start systems like CloudFormation and bcfg2 and “configuration” tools that control and start software like Chef and Puppet (and some people even differentiate between those and “deploy” tools that control and start applications like Capistrano) is a false dichotomy. I’m all about the “toolchain” approach but at some point you need a toolbelt. This tool differentiation is one of the more harmful “Dev vs Ops” differentiations.

I hope that this move shows the value of system modeling and helps people understand we need an overarching model that can be used to define it all, not just “Amazon” vs “Azure” or “system packages” vs “developed applications” or “UNIX vs Windows…” True system automation will come from a UNIVERSAL model that can be used to reason about and program to your on premise systems, your Amazon systems, your Azure systems, your software, your apps, your data, your images and files…

Conclusion

You need to understand CloudFormation, because it is one of the most foundational changes that will have a lot of leverage that AWS has come out with in some time. I don’t bother to blog about most of the cool new AWS features, because they are cool and I enjoy them but this is part of a more revolutionary change in the way systems are managed, the whole DevOps thing.

7 Comments

Filed under Cloud, DevOps

Scrum for Operations: Order from Chaos

Welcome to the second installment in Scrum for Operations, a series where I talk about (and go through) the process of doing systems work as part of a DevOps team according to the Scrum methodology. Last time, I introduced the basics of Scrum as it generally is used, and its key benefit of frequently delivering useful functionality. But I already hear the objections – “How can that turn out all right?” It is so light on process that one’s initial inclination is to dismiss it as “cowboy coding”, and we already know not to be “cowboy sysadmins,” right? One’s intuition might be (and mine was in the beginning, I’ll be honest) that this would lead to a metastable process that could not be sustainable without fundamental fatal flaws overtaking it.

Well, as I learned after trying to learn more and kicking the tires with our dev team here, there are several core disciplines that are agile’s saving graces.

Testing

We ops guys are used to testing being a neglected afterthought in the development process, often tossed over the wall to a QA team that isn’t well integrated into the product. Therefore we have a hard time trusting code that’s being handed over to us given our experience – we get it handed to us and it doesn’t work!

Well, agile pretty much understands that without pervasive testing, this kind of fast cycle process is doomed. At its extreme, some practitioners use Test Driven, aka Test First, development where failing tests must be written first and then code filled in behind it till the test passes. This creates a large inherent test framework.

Even agile groups that don’t do this almost always have metrics on unit test coverage and a required bar devs must hit.  Here, our desktop software group that’s newly using Scrum has the mandate that “there must be XX% unit test code coverage or you’re not ready to ship.”

Similarly, acceptance testing (automated continuous testing of user stories vs the code) is a common part of agile. Continuous ongoing testing ensures quality through the dev cycle and reduces the need for time-intensive, and mysteriously always insufficient, big-cycle regression testing.

This is a great culture. And there’s all kinds of different tests – unit test, integration/functional/regression testing, performance testing, fault testing… Starting to get interesting to you?  How about monitoring? In reality application monitoring is a special case of testing – it’s a “lightweight integration regression test.” Our initial approach to DevOps includes making test coverage goals for things like monitors and performance testing, because that plugs into the existing agile mindset well.

Bonus new terminology thing – the quick acceptance test you do upon release, which we always called “critical path testing,” is now being called “smoke testing” by the hip. Update your dictionaries!

A side note on formal QA groups. Just as we are working on DevOps, there has been previous work on how QA teams interact with agile dev teams, and there are a variety of different doctrines on how to split the work – often, it’s devs that are responsible for a lot of the testing. It’s a hard balance – you want the devs to be responsible for some of the testing because the best testing is “close to the code,” but just like with Ops, a real QA team has expertise beyond what a developer can just bolt in with 10% of their attention. Here, we have a dev team and also a remote QA team; devs test their own code on the daily build and then there’s a weekly push to a more stable environment where the QA team does acceptance testing and is moving into performance testing and the like.

Anyway, this endemic focus on testing and automation of testing and testing metrics is the pin that makes this agile flywheel actually turn without just flying off. (You are correct, some agile teams don’t do this – we call those “the unsuccessful ones.”)

And this is for you to do as well! There’s a whole post or series of posts in the topic “What does a unit test mean for something infrastructurey” – it is incumbent on you to figure it out and also have high test coverage with your work.

Refactoring

In general, agile dev is the epitome of horizon planning. You know you can’t get all the requirements ahead of time (or if you do get them all ahead of time, what you come out with won’t serve any real human’s need) and similarly preplanned architecture and design often doesn’t survive contact with the scrum. So it’s not “don’t plan or design,” but it’s “plan and design in an ongoing manner.”

This is one of the scariest parts for an ops person – we assume that we get one “bite at the apple”, and once we’ve set up the systems and let in the developers, we’ll never be allowed to change anything without a fight.  But developers have this problem internally all the time – one dev is working on a core library or API that other developers are using, and they don’t wait for core guy to get done before they start. Instead, they have adopted a concept they call refactoring. Refactoring just means that each sprint, you are open to redoing fundamental stuff that needs to change (or that you realize you did kinda ghetto in the first place).

Because this is an accepted part of the iterative approach, you get to leverage this as well.First iteration they get the basic Tomcat and mySQL install out of the repo, and they can get started – and then in the second iteration you front it with Apache, or tune the DB for security, or whatnot and they have to make some changes to fit. I’m not promising no one will ever cry about this, but it’s a part of what makes the culture successful so it’s there for you to leverage.

And for you to adhere to! Be open to refactoring your infrastructure based on the emerging project needs.

Source Control

A developer might not even mention this, and most books on agile don’t, because to them it’s so fundamental a discipline that it’s like breathing air. Sadly the same can not be said of Ops folks, so I’m mentioning it. When code is changed, it is in a shared source control repository – which gives other people on the group visibility into it (a collaboration touchpoint), is a common place to source it from (a deployment touchpoint), and can be used to easily manage multiple versions, even experimental ones, and merge or roll back changes.

This is the most fundamental empowering technology of modern software development (not just agile) and you must uptake it immediately or you have lost.  Fair warning. It is the stepping stone that will allow subsequent Cool DevOps Automation to happen.

Conclusion

These three disciplines convert agile/Scrum from dangerous free-for-all to a new technique that gets your product done both more quickly and with higher quality than a waterfall method. I’ll talk further next time about how Ops slots into all this, and how you can fit your systems admin work into a Scrum mindset.

Also note, there’s some other agile disciplines surrounding agile design and encapsulation and patterns and whatnot, which I don’t understand well enough yet to speak authoritatively on. Feel free and chime in with other core disciplines if you are!

Leave a comment

Filed under DevOps