the agile admin

by Ernest Mueller | April 21, 2011 · 2:40 pm

The Real Lessons To Learn From The Amazon EC2 Outage

As I write this, our ops team is working furiously to bring up systems outside Amazon’s US East region to recover from the widespread outage they are having this morning. Naturally the Twitterverse, and in a day the blogosphere, and in a week the trade rag…axy? will be talking about how the cloud is unreliable and you can’t count on it for high availability.

And of course this is nonsense. These outages make the news because they hit everyone at once, but all the outages people are continually having at their own data centers are just as impactful, if less hit-generating in terms of news stories.

Sure, we’re not happy that some of our systems are down right now. But…

The outage is only affecting one of our beta products, our production SaaS service is still running like a champ, it just can’t scale up right now.
Our cloud product uptime is still way higher than our self hosted uptime. We have network outages, Web system outages, etc. all the time even though we have three data centers and millions of dollars in network gear and redundant Internet links and redundant servers.

People always assume they’d have zero downtime if they were hosting it. Or maybe that they could “make it get fixed” when it does go down. But that’s a false sense of security based on an archaic misconception that having things on premise gives you any more control over them.We run loads of on premise Web applications and a batch of cloud ones, and once we put some Keynote/Gomez/Alertsite against them we determined our cloud systems have much higher uptime.

Now, there are things that Amazon could do to make all this better on customers. In the Amazon SLAs, they say of course you can have super high uptime – if you are running redundantly across AZs and, in this case, regions. But Amazon makes it really unattractive and difficult to do this.

What Amazon Can Do Better

We can work around this issue by bringing up instances in other regions. Sadly, we didn’t already have our AMIs transferred into those regions, and you can only bring up instances off AMIs that are already in those regions. And transferring regions is a pain in the ass. There is absolutely zero reason Amazon doesn’t provide an API call to copy an AMI from region 1 to region 2. Bad on them. I emailed my Amazon account rep and just got back the top Google hits for “Amazon AMI region migrate”. Thanks, I did that already.
We weren’t already running across multiple regions and AZs because of cost. Some of that is the cost of redundancy in and of itself, but more importantly is the hateful way Amazon does reserve pricing, which very much pushes you towards putting everything in one AZ.
Also, redundancy only really works if you have everything, including data, in that AZ. If you are running redundant app servers across 4 AZs, but have your database in one of them – 0r have a database master in one and slaves in the others – you still get hosed by a particular region downtime.

Amazon needs to have tools that inherently let you distribute your stuff across their systems and needs to make their pricing/reserve strategy friendlier to doing things in what they say is “the right way.”

What We Could Do Better

We weren’t completely prepared for this. Once that region was already borked, it was impossible to migrate AMIs out of it, and there are so many ticky little region specific things all through the Amazon config – security groups, ELBs, etc – doing that on the fly is not possible unless you have specifically done it before, and we hadn’t.

We have an automation solution (PIE) that will regen our entire cloud for us in a short amount of time, but it doesn’t handle the base images, some of which we modify and re-burn from the Amazon ones. We don’t have that process automated and the documentation was out of date since Fedora likes to move their crap around all the time.

In the end, Amazon started coming back just as we got new images done in us-west-1. We’ll certainly work on automating that process, and hope that Amazon will also step up to making it easier for their customers to do so.

Leave a comment

Filed under Cloud, DevOps

Tagged as amazon, aws, ec2, outage

by Ernest Mueller | April 13, 2011 · 8:34 am

Our Cloud Products And How We Did It

Hey, I’m not a sales guy, and none of us spend a lot of time on this blog pimping our company’s products, but we’re pretty proud of our work on them and I figured I’d toss them out there as use cases of what an enterprise can do in terms of cloud products if they get their act together!

Some background. Currently all the agile admins (myself, Peco, and James) work together in R&D at National Instruments. It’s funny, we used to work together on the Web Systems team that ran the ni.com Web site, but then people went their own ways to different teams or even different companies. Then we decided to put the dream team back together to run our new SaaS products.

About NI

Some background. National Instruments (hereafter, NI) is a 5000+ person global company that makes hardware and software for test & measurement, industrial control, and graphical system design. Real Poindextery engineering stuff. Wireless sensors and data acquisition, embedded and real-time, simulation and modeling. Our stuff is used to program the Lego Mindstorms NXT robots as well as control CERN’s Large Hadron Collider. When a crazed highlander whacks a test dummy on Deadliest Warrior and Max the techie looks at readouts of the forces generated, we are there.

About LabVIEW

Our main software product is LabVIEW. Despite being an electrical engineer by degree, we never used LabVIEW in school (this was a very long time ago, I’ll note, most programs use it nowadays), so it wasn’t till I joined NI I saw it in action. It’s a graphical dataflow programming language. I assumed that was BS when I heard it. I had so many companies try to sell be “graphical” programming over the years, like all those crappy 4GLs back in the ‘9o’s, that I figured that was just an unachieved myth. But no, it’s a real visual programming language that’s worked like a champ for more than 20 years. In certain ways it’s very bad ass, it does parallelism for you and can be compiled and dropped onto a FPGA. It’s remained niche-ey and hasn’t been widely adopted outside the engineering world, however, due to company focus more than anything else.

Anyway, we decided it was high time we started leveraging cloud technologies in our products, so we created a DevOps team here in NI’s LabVIEW R&D department with a bunch of people that know what they’re doing, and started cranking on some SaaS products for our customers! We’ve delivered two and have announced a third that’s in progress.

Cloud Product #1: LabVIEW Web UI Builder

First out of the gate – LabVIEW Web UI Builder. It went 1.0 late last year. Go try it for free! It’s a Silverlight-based RIA “light” version of LabVIEW – you can visually program, interface with hardware and/or Web services. As internal demos we even had people write things like “Duck Hunt” and “Frogger” in it – it’s like Flash programming but way less of a pain in the ass. You can run in browser or out of browser and save your apps to the cloud or to your local box. It’s a “freemium” model – totally free to code and run your apps, but you have to pay for a license to compile your apps for deployment somewhere else – and that somewhere else can be a Web server like Apache or IIS, or it can be an embedded hardware target like a sensor node. The RIA approach means the UI can be placed on a very low footprint target because it runs in the browser, it just has to get data/interface with the control API of whatever it’s on.

It’s pretty snazzy. If you are curious about “graphical programming” and think it is probably BS, give it a spin for a couple minutes and see what you can do without all that “typing.”

A different R&D team wrote the Silverlight code, we wrote the back end Web services, did the cloud infrastructure, ops support structure, authentication, security, etc. It runs on Amazon Web Services.

Cloud Product #2: LabVIEW FPGA Compile Cloud

This one’s still in beta, but it’s basically ready to roll. For non-engineers, a FPGA (field programmable gate array) is essentially a rewritable chip. You get the speed benefits of being on hardware – not as fast as an ASIC but way faster than running code on a general purpose computer – as well as being able to change the software later.

We have a version of LabVIEW, LabVIEW FPGA, used to target LabVIEW programs to an FPGA chip. Compilation of these programs can take a long time, usually a number of hours for complex designs. Furthermore the software required for the compilation is large and getting more diverse as there’s more and more chips out there (each pretty much has its own dedicated compiler).

So, cloud to the rescue. The FPGA Compile Cloud is a simple concept – when you hit ‘compile’ it just outsources the compile to a bunch of servers in the cloud instead of locking up your workstation for hours (assuming you’ve bought a subscription). FPGA compilations have everything they need with them, there’s not unique compile environments to set up or anything, so it’s very commoditizable.

The back end for this isn’t as simple as the one for UI Builder, which is just cloud storage and load balanced compile servers – we had to implement custom scaling for the large and expensive compile workers, and it required more extensive monitoring, performance, and security work. It’s running on Amazon too. We got to reuse a large amount of the infrastructure we put in place for systems management and authentication for UI Builder.

Cloud Product #3: Technical Data Cloud

It’s still in development, but we’ve announced it so I get to talk about it! The idea behind the Technical Data Cloud is that more and more people need to collect sensor data, but they don’t want to fool with the management of it. They want to plop some sensors down and have the acquired data “go to the cloud!” for storage, visualization, and later analysis. There are other folks doing this already, like the very cool Pachube (pronounced “patch-bay”, there’s a LabVIEW library for talking to it), and it seems everyone wants to take their sensors to the cloud, so we’re looking at making one that’s industrial strength.

For this one we are pulling our our big guns, our data specialist team in Aachen, Germany. We are also being careful to develop it in an open way – the primary interface will be RESTful HTTP Web services, though LabVIEW APIs and hardware links will of course be a priority.

This one had a big technical twist for us – we’re implementing it on Microsoft Windows Azure, the MS guys’ cloud offering. Our org is doing a lot of .NET development and finding a lot of strategic alignment with Microsoft, so we thought we’d kick the tires on their cloud. I’m an old Linux/open source bigot and to be honest I didn’t expect it to make the grade, but once we got up to speed on it I found it was a pretty good bit of implementation. It did mean we had to do significant expansion of our underlying platform we are reusing for all these products – just supporting Linux and Windows instance in Amazon already made us toss a lot of insufficiently open solutions in the garbage bin, and these two cloud worlds are very different as well.

How We Did It

I find nothing more instructive than finding out the details – organizational, technical, etc. – of how people really implement solutions in their own shops. So in the interests of openness and helping out others, I’m going to do a series on how we did it! I figure it’ll be in about three parts, most likely:

How We Did It: People
How We Did It: Process
How We Did It: Tools and Technologies

If there’s something you want to hear about when I cover these areas, just ask in the comments! I can’t share everything, especially for unreleased products, but promise to be as open as I can without someone from Legal coming down here and Tasering me.

5 Comments

Filed under Cloud, DevOps

Tagged as agile, amazon, azure, Cloud, DevOps, labview, ni, SaaS

by Ernest Mueller | April 11, 2011 · 12:10 pm

Security and the Rise (and Fall?) of DevOps

As I’ve been involved with DevOps and its approach of blending development and operations staff together to create better products, I’ve started to see similar trends develop in the security space. I think there’s some informative parallels where both can learn from each other and perhaps avoid some pitfalls.

Here’s a recent article entitled “Agile: Most security guys are useless” that states the problem succinctly. In successful and agile orgs, the predominant mindset is that if you’re not touching the product, you are semi-useless overhead. And there’s some truth to that. When people are segregated into other “service” orgs – like operations or security – the us vs. them mindset predominates and strangles innovation in its crib.

The main initial drive of agile was to break down that wall between the devs and the “business”, but walls remain that need similar breaking down. With DevOps, operations organizations faced with this same problem are innovating new approaches; a collaborative approach with developers and operations staff working together on the product as part of the same team. It’s working great for those who are trying it, from the big Web shops like Facebook to the enterprise guys like us here at NI. The movement is gathering steam and it seems clear to those of us doing it this way that it’s going to be a successful and disruptive pattern for adopters.

But let’s not pat ourselves on the back too much just yet. We still have a lot of opportunity to screw it up. Let’s review an example from another area.

In the security world, there is a whole organization, OWASP (the Open Web Application Security Project) whose goal is to promote and enable application security. Security people and developers, working together! Dev+Sec already exists! Or so the plan was.

However, recently there have been some “shots across the bow” in the OWASP community. Read Security People vs Developers and especially OWASP: Has It Reached A Tipping Point? The latter is by Mark Curphey, who started OWASP. He basically says OWASP is becoming irrelevant because it’s leaving developers behind. It’s becoming about “security professionals” selling tools and there’s few developers to be found in the community any more.

And this is absolutely true. We host the Austin OWASP chapter here at NI’s Austin campus, and two of the officers are NI employees. We make sure and invite NI developers to come to OWASP. Few do, at least not after the first couple times. I asked some of the devs on our team why not, and here’s some answers I got.

I want to leave sessions by saying, “I need to think about this the next time I code”. I leave sessions by saying, “that was cool, I can talk about this at a happy hour”. If I could do the former, I’d probably attend most/all the sessions.
A lot of the sessions don’t seem business focused and it is hard to relate. Demos are nicer; but a lot of times they are so specific (for example: specific OS + specific Java Version + specific javascript library = hackable) that it’s not actionable.
“Security people” think “developers” don’t know what they are doing and don’t care about security. Which to developers is offensive. We like to write secure applications; sometimes we just find the bugs too late….
I’ve gone to, I think, 4 OWASP meetings. Of those, I probably would only have recommended one of them to others – Michael Howard’s. I think it helped that he was a well-known speaker and seemed to have a developer focus. So, well-known speakers, with a compelling and relevant subject.. Even then, the time has to be weighed against other priorities. For example, today’s meeting sounds interesting, but not particularly relevant. I’ll probably skip it.

In the end, the content at these meetings is more for security pros like pen testers, or for tool buyers in security or sysadmin groups. “How do I code more securely” is the alleged point of the group but frankly 90% of the activity is around scanners and crackers and all kinds of stuff that is fine but should be simple testing steps after the code’s written securely in the first place.

As a result there have been interesting ideas coming from the security community that are reminiscent of DevOps concepts. Pen tester @atdre did a talk here to the Austin OWASP chapter about how security testers engaging with agile teams “from the outside” are failing, and shouldn’t we instead embed them on the team as their “security buddy.” (I love that term. Security buddy. I hate my “compliance auditor” without even meeting the poor bastard, but I like my security buddy already.) At the OWASP convention LASCON, Matt Tesauro delivered a great keynote similarly trying to refocus the group back on the core problem of developing secure software; in fact, they’re co-sponsoring a movement called “Rugged” that has a manifesto similar to the Agile Manifesto but is focused on security, availability, reliability, et cetera. (As a result it’s of interest to us sysadmin types, who are often saddled with somehow applying those attributes in production to someone else’s code…)

The DevOps community is already running the risk of “leaving the devs behind” too. I love all my buddies at Opscode and DTO and Puppet Labs and Thoughtworks and all. But a lot of DevOps discussions have started to be completely sysadmin focused as well; a litany of tools you can use for provisioning or monitoring or CI. And that wouldn’t be so bad if there was a real entry point for developers – “Here’s how you as a developer interact with chef to deploy your code,” “Here’s how you make your code monitorable”. But those are often fringe discussions around the core content which often mainly warms the cockles of a UNIX sysadmin’s heart. Why do any of my devs want to see a presentation on how to install Puppet? Well, that’s what they got at a recent Austin Cloud User Group meeting.

As a result, my devs have stopped coming to DevOps events. When I ask them why, I get answers similar to the ones above for why they’re not attending OWASP events any more. They’re just not hearing anything that is actionable from the developer point of view. It’s not worth the two hours of their valuable time to come to something that’s not at all targeted at them.

And that’s eventually going to scuttle DevOps if we let it happen, just as it’ll scuttle OWASP if it continues there. The core value of agile is PEOPLE over processes and tools, COLLABORATION over negotiation. If you are leaving the collaboration behind and just focusing on tools, you will eventually fail, just in a more spectacular and automated fashion.

The focus at DevOpsDays US 2010 was great, it was all about culture, nothing about tools. But that culture talk hasn’t driven down to anything more actionable, so tools are just rising up to fill the gap.

In my talk at that DevOpsDays I likened these new tools and techniques to the introduction of the Minie ball to rifles during the Civil War. In that war, they adopted new tools and then retained their same old tactics, walking up close in lines designed for weapons with much shorter ranges and much lower accuracy – and the slaughter was profound.

All our new DevOps tools are great, but in the same way, if we don’t adapt our way of thinking to them, they will make our lives worse, not better, for all their vaunted efficiency. You can do the wrong thing en masse and more quickly. The slaughter will similarly be profound.

A sysadmin suddenly deciding to code his own tools isn’t really the heart of DevOps. It’s fine and good, and I like seeing more tools created by domain experts. But the heart of DevOps, where you will really see the benefits in hard ROI, is developers and operations folks collaborating on real end-consumer products.

If you are doing anything DevOpsey, please think about “Why would a developer care about this?” How is it actionable to them, how does it make their lives easier? I’m a sysadmin primarily, so I love stuff that makes my job easier, but I’ve learned over the years that when I can get our devs to leverage something, that’s when it really takes off and gives value.

The same thing applies to the people on the security side. Why do we have this huge set of tools and techniques, of OWASP Top 10s and Live CDs and Metasploits and about a thousand wonderful little gadgets, but code is pretty much as shitty and insecure as it was 20 years ago? Because all those things try to solve the problem from the outside, instead of targeting the core of the matter, which is developers developing secure code in the first place. And to do that, it’s more of a hearts-and-minds problem than a tools-and-processes problem.

That’s a core realization that operations folks, and security folks, and testing folks, and probably a bunch of other folks need to realize, deeply internalize, and let it change the way they look at the world and how they conduct their work.

5 Comments

Filed under DevOps, Security

Tagged as agile, appsec, DevOps, owasp, rugged, Security, slaughter

by Ernest Mueller | March 31, 2011 · 9:26 am

Why Amazon Reserve Instances Torment Me

We’ve been using over 100 Amazon EC2 instances for a year now, but I’ve just now made my first reserve instance purchase. For the untutored, reserve instances are where you pay a yearly upfront per instance and you get a much, much lower hourly cost. On its face, it’s a good deal – take a normal Large instance you’d use for a database. For a Linux one, it’s $0.34 per hour. Or you can pay $910 up front for the year, and then it’s only $0.12 per hour. So theoretically, it takes your yearly cost from $2978.40 to $1961.2. A great deal right?

Well, not so much. The devil is in the details.

First of all, you have to make sure and be running all those instances all the time. If you buy a reserve instance and then don’t use it some of the time, you immediately start cutting into your savings. The crossover is at 172 days – if you don’t run the instance at least 172 days out of the year then you are going upside down on the deal.

But what’s the big deal, you ask? Sure, in the cloud you are probably (and should be!) scaling up and down all the time, but as long as you reserve up to your low water mark it should work out, right?

So the big second problem is that when you reserve instances, you have to specify everything about that instance. You aren’t reserving “10 instances”, or even “10 large instances” – you have to specify:

Platform (UNIX/Linux, UNIX/Linux VPC, SUSE Linux, Windows, Windows VPC, or Windows with SQL Server)
Instance Type (m1.small, etc.)
AZ (e.g. us-east-1b)

And tenancy and term. So you have to reserve “a small multitenant Linux instance in us-east-1b for one year.” But having to specify down to this level is really problematic in any kind of dynamic environment.

Let’s say you buy 10 m1.large instances for your databases, and then you realize later you really need to move up to an m1.xlarge. Well, tough. You can, but if you don’t have 10 other things to run on those larges, you lose money. Or if you decide to change OS. One of our biggest expenditures is our compile farm workers, and on those we hope to move from Windows to Linux once we get the software issues worked out, and we’re experimenting with best cost/performance on different instance sizes. I’m effectively blocked from buying reserve for those, since if I do it’ll put a stop to our ability to innovate.

And more subtly, let’s say you’re doing dynamic scaling and splitting across AZs like they always say you should do for availability purposes. Well, if I’m running 20 instances, and scaling them across 1b and 1c, I am not guaranteed I’m always running 10 in 1b and 10 in 1c, it’s more random than that. Instead of buying 20 reserve, you instead have to buy say 7 in 1b and 7 in 1c, to make sure you don’t end up losing money.

Heck, they even differentiate between Linux and Suse and Linux VPC instances, which clearly crosses over into annoyingly picky territory.

As a result of all this, it is pretty undesirable to buy reserve instances unless you have a very stable environment, both technically and scale-wise. That sentence doesn’t describe the typical cloud use case in my opinion.

I understand, obviously, why they are doing this. From a capacity planning standpoint, it’s best for them if they make you specify everything. But what I don’t think they understand is that this cuts into people willing to buy reserve, and reserve is not only upfront money but also a lockin period, which should be grotesquely attractive to a business. I put off buying reserve for a year because of this, and even now that I’ve done it I’m not buying near as many reserve as I could be because I have to hedge my bets against ANY changes to my service. It seems to me that this also degrades the alleged point of reserves, which is capacity planning – if you’re so picky about it that no one buys reserve and 95% of your instances are on demand, then you can’t plan real well can you?

What Amazon needs to do is meet customers halfway. It’s all a probabilities game anyway. They lose specificity of each given reserve request, but get many more reserve requests (and all the benefits they convey – money, lockin, capacity planning info) in return.

Let’s look at each axis of inflexibility and analyze it.

Size. Sure, they have to allocate machines, right? But I assume they understand they are using this thing called “virtualization.” If I want to trade in 20 reserved small instances for 5 large instances (each large is 4x a small), why not? It loses them nothing to allow this. They just have to make the effort to allow it to happen in their console/APIS. I can understand needing to reserve a certain number of “units” but those should be flexible on exact instance types at a given time.
OS. Why on God’s green earth do I need to specify OS? Again, virtualized right? Is it so they can buy enough Windows licenses from Microsoft? Cry me a river. This one needs to leave immediately and never come back.
AZ. This is annoying from the user POV but probably the most necessary from the Amazon POV because they have to put enough hardware in each data center, right? I do think they should try to make this a per region and not a per AZ limit, so I’m just reserving “us-east” in Virginia and not the specific AZ, that would accommodate all my use cases.

In the end, one of the reasons people move to the cloud in the first place is to get rid of the constraints of hardware. When Amazon just puts those constraints back in place, it becomes undesirable. Frankly even now, I tried to just pay Amazon up front rather than actually buy reserve, but they’re not really enterprise friendly yet from a finance point of view so I couldn’t make that happen, so in the end I reluctantly bought reserve. The analytics around it are lacking too – I can’t look in the Amazon console and see “Yes, you’re using all 9 of your large/linux/us-east-1b instances.”

Amazon keeps innovating greatly in the technology space but in terms of the customer interaction space, they need a lot of help – they’re not the only game in town, and people with less technically sophisticated but more aggressive customer services/support options will erode their market share. I can’t help that Rackspace’s play with backing OpenStack seems to be the purest example of this – “Anyone can run the same cloud we do, but we are selling our ‘fanatical support'” is the message.

8 Comments

Filed under Cloud

Tagged as amazon, aws, Cloud, ec2, reserve

by Ernest Mueller | March 16, 2011 · 11:17 am

DevOps In Action Podcast

While at SXSW, I recorded an episode of the IT Management and Cloud Podcast with Michael Coté and John Willis. You can find it on Coté’s People Over Process blog! Sorry about the background noise, we were doing it in the upstairs bar at the Driskill.

Leave a comment

Filed under Cloud, Conferences, DevOps

Tagged as DevOps, sxsw, sxswi

by Ernest Mueller | March 14, 2011 · 10:08 am

SXSW Interactive 2011 Day Two

Day 2 started off a bit rocky but then got really good. I tried to get to the Lean Startup: King of the Apps Showdown session, but the shuttle buses take a long route and are few and far between, so I got there pretty late. I saw snapappointments.com, snapwebsites, Planzai, Icebreakr,and mapomatic all get their app and pitch critiqued by a panel of folks – Robert Scoble, Eric Ries, Dave McClure, Bill Boebel, and Stacey Higginbotham. They demanded mapomatic release right away, and told snapwebsites to remove half of his features. I was ambivalent about this advice – I can see streamlining the UI to hide advanced functionality, but “remove functionality”? Sure, a simple app has a wider reach, but if you can effectively provide an advanced mode/version with more functionality that professionals need, wouldn’t that be better? I just don’t like how all these apps are degenerating into one very narrow function “plus it checks you into Foursquare!!!!” Bah.

They joked about how many of the candidates had “Snap” in their name, and then on the bus on the way back I met a girl from TeamSnap. They have a site that lets you arrange sports teams, even take Paypal payments for team dues, very slick.

And congratulations to friend of the blog Lenny Rachitsky (@lennysan) of Localmind, who has been singled out by Scoble as “best SXSW app so far!” You should all go download it from the App Store now.

Then I tried to go to the session on How To Innovate At Big Companies, with Gene Kim of Visible Ops fame – I left the AT&T center early and went all the way to the Hyatt. But it was so full they had 30 people standing outside only allowed in if anyone left. So I went over to the Hilton to try to catch Mistakes I Made Building Netflix for the iPhone. And it also had a line of “one in, one out” standby people. Son of a bitch. They better be videoing this stuff. Anyway, at this point it was too far into the slot so I just went to the Screenburn Arcade, which was fun. Got my picture taken with Loren Wiseman of Traveller fame – apparently the venerable RPG is being turned into an iPhone MMO!

To recuperate from our disappointing morning, Peco and I went and had a burger at Casino El Camino, which takes a ridiculously long time but is worth it, they have the best burgers in Austin. We hooked up with Cote and John Willis there, and then went to the Etsy Code as Craft: Moving Fast At Scale event which was brilliant and DevOpsey. Continuous deployment, only coding off trunk, logging and metrics, “dashboard driven development”, and more. A lot of the topics have info on them on the Etsy Code as Craft blog – they didn’t record the talk but promise to record it the next time they give it and put it up.

I took a lot of notes but the high points were:

They have open sourced their log collection/graphing tool, Logster, and their stats collection daemon, statsd.
They perform hundreds of releases a month but only had six “bad deploys” over the course of a year by combining one-button deploy, testing, feature flags, and a theory of roll-forward. No more organized releases, release managers, rollbacks, all that stuff.
They have a cute “deployinator” dashboard that lets people (and not just coders, they have more people deploying than they have developers) do the deploys – they’ll open source it, but stressed that it’s not much technically, the culture and practices are 99% of the work to get to this.
They use a lot of chef but don’t use it for code deploys because that’s a simple task that they need more control over.

That was a two hour event, so then after a little VIP partyin’ atop Fogo de Chao, we packed it in for the day.

Leave a comment

Filed under Conferences

Tagged as DevOps, interactive, sxsw, sxswi

by Ernest Mueller | March 12, 2011 · 8:24 am

SXSW Interactive 2011 Day One

We started out the week gently, with a light Friday (why am I still so tired then?). Two of your favorite Agile Admins, Peco and Ernest, were down at the Austin Convention Center to experience SXSW Interactive! The third Agile Admin, James, was busy giving a talk at the nearby Security B-Sides Austin.

After getting down there early, getting badges, and getting oriented, we saw Jason Calacanis interview Tim O’Reilly. He gave a lot of interesting insight into the development of innovation, tracing the Internet, open source, Web 2.0/social media/user contributed content, cloud, and Big Data.

The single most interesting thing he said, though, was a side note that illustrated how hard it is for companies to maintain a real vision over time, especially once they get big and various stakeholders’ needs conflict – he talked about how many people have become billionaires off of O’Reilly ideas and how there’s been pressure for him to sell out or cash in – like Cisco offered to buy them, noting that “You guys are always first on the scene with the cool stuff but you fail to exploit it.” Tim always rejected those offers, though he did note he is often conflicted because of all the people he has working for them – and not making all those great people the money that would bring.

Next, Peco saw Google’s Marissa Meyer talk (he’ll have to share what went on there) and I kinda played hooky by going to a session about “daddy bloggers,” as that’s a personal interest of mine.

Then we both went to a session that was supposed to be about “The Connected Car: Driving Technology” and automobile telemetrics, but it sucked. It was some car guys and a lady from Pandora telling us over and over again that “cars hooked up to networks and stuff are cool.” I am willing to put up with about 5 minutes of telling me something’s cool, but if it doesn’t give way quickly to you SHOWING me that it’s cool, I’m out. We bailed 30 minutes in, along with a lot of the other attendees.

Then we went down to the Austin Music Hall for four hours of Ignite, a format where presenters get 5 minutes and auto-advancing slides to make a point. The topic was “2021 Visions of the Future.” Besides the talks and some bands, there were also a bunch of Arduino and robotics and various weird Maker type booths, which was fun. The crowd was really varied, in fact there were other people there from NI who had been invited via various completely different vectors (UT School of Engineering, Austin Ventures startup stuff, SXSWi). [Side note – there was a contest to drop an egg from the balcony safely using only 4 sheets of paper and a couple feet of tape, and I was one of the winners and got a Roku for my troubles! Engineering education FTW.]

Some presentations were good, others incoherent, but a common theme throughout the day was people doing things they have a passion for, and worrying about the money later. This was a theme in Tim O’Reilly’s talk and it pervaded the presentations at Ignite. Really most people aren’t in the field they’re in because it’s the thing with the highest ROI they qualify for, they are in it because they have some kind of passion for it. A lot of life then tries to stomp that passion out, but the enabling factors of the Web, DIY, etc. are making it so you can flip off “the Man” and pursue your passion yourself if you want to. As a large company we struggle with that, and try to promote people following their passions and enable internal entrepreneurship at a high level, while however the natural grinding wheels of a large organization grind that into meal at the middle levels.

Then we got free ice cream from the Free Ice Cream Man and headed home. I’m getting sick and feel whupped, but it was energizing to see so many people doing so many great things – and let me tell you, SXSWi is getting HUGE. It’s nothing like the first year I went, when there were maybe ten sessions and the conference center was largely empty. The place was packed; companies have bought out and transformed nearby buildings – a bar across the street is now the Playstation Lounge, with huge video wall on the roof and stuff; CNN had a giant neon sign installed at one restaurant… I can tell the economy’s looking up and that Interactive is hot because Lordy there’s money getting thrown at this thing.

See the rest of my pictures from Day One of SXSW here!

Leave a comment

Filed under Conferences

Tagged as interactive, sxsw, sxswi

by Ernest Mueller | March 10, 2011 · 1:50 pm

SXSW Tips If You’re From Out Of Town

I just went to lunch with a visitor in town for SXSW (@lennysan from Localmind) and it reminded me of some of the ‘gotchas’ that someone not from Austin may not know.

If your hotel doesn’t have a shuttle, and it is not immediately inside the downtown area, you will need a car. (Unless you are real close to the one light rail line, and I wouldn’t bet on it not being totally overloaded). Austin is big and not pedestrian friendly outside of IMMEDIATELY adjacent to the river.
Here’s an awesome downtown parking guide from Community Impact. Shows where they are, what the rates are, whether they’re lot or structure.
You can rent your gun at the airport when you land, to avoid spending time finding another place to do it.
Just because a place has a big “BBQ” sign doesn’t mean you should go eat barbecue there. We were eating real BBQ at Rudy’s and looking across the highway at the Bone Daddy’s that has a big ol’ sign on it saying BBQ. I’m not saying Bone Daddy’s doesn’t have things to recommend it, but BBQ isn’t really its strength. If you’re downtown, eat at the Iron Works, and if anyone is organizing a field trip out of town to the really golden places in Taylor or Llano, go along.
Someone asked on Localmind about breakfast during SXSW – here, breakfast is breakfast tacos, except for hotel restaurants and Denny’s.

Any more tips for furriners? Post them here!

1 Comment

Filed under Conferences

the agile admin

The Real Lessons To Learn From The Amazon EC2 Outage

What Amazon Can Do Better

What We Could Do Better

Security and the Rise (and Fall?) of DevOps

DevOps In Action Podcast

SXSW Interactive 2011 Day Two

SXSW Interactive 2011 Day One

SXSW Tips If You’re From Out Of Town

Subscribe

Recent Comments

Recent Posts

Austinites

Cloud

DevOps

Archives