The Cloud Procurement Pecking Order

I was planning to go to this meeting here in town about “Preparing for the post-IaaS phase of cloud adoption” and it brought home to me how backwards people are generally thinking about cloud. So now you get Ernest’s Cloud Rant of the Day.

What people are doing is moving in order of comfort, basically.  “I’ll start with private cloud… Then maybe public IaaS… Eventually we’ll look at that other whizbang stuff.” But here’s what your decision path should be instead.

Cloud Procurement Flowchart

  1. Is it available as a SaaS solution?  If so, use that. You shouldn’t need to host servers or write code for many of your needs – everything from email to ERP is commoditized nowadays.
  2. Can I do it in a public PaaS?  Then use a public PaaS (Heroku/Beanstalk/Google App Engine/Azure), unless you have some real (not FUD) requirements to do it in house.
  3. Can I do it in a private PaaS? Then use Cloudfoundry or similar. Or do you really (for non-FUD reasons) need access to the hardware?
  4. Can I do it in public IaaS? Then use Amazon. Or do you really (for non-FUD reasons) need it “on premise” (probably not really on premise, but in some datacenter you’re leasing – which is different from being outsourced in the cloud why)?
  5. Can I do it in a private cloud? This is your final recourse before doing it the old fashioned way – unless you have extremely unique hardware requirements, you probably can. Also, you can do hybrid cloud – basically private cloud plus public cloud (IaaS only really). This gets you some of the IaaS benefits while complicating your architecture.

What About The Cost?

This seems to be inverted from how people are inching into the cloud. But the lower on this list you are, the less additional value you are getting from the solution (assuming the same price point). You should instead be reluctantly dragged into these lower levels – which require more effort and often (though not always) more expense. “But what about the cost,” you say, “the cloud gets more expensive than me running a couple servers?”

You need to keep in mind the real costs of your infrastructure when you do this – I see a lot of people spending a lot of work on private cloud that really shouldn’t be. If you simply compare “buying servers” with “cost per month in Amazon” it can seem like you need to go hybrid after a couple hundred thousand dollars appear on your bill. But:

1. Make sure you are taking into account your fully loaded cost (includes data center, power cooling, etc.) of all assets (servers, storage, network…) you are using to do this private. Use the real numbers, not the “funny money” numbers – at a previous company we allocated network and other shared costs across the entire company, while “our IT budget” had to pay for servers – don’t be a goon,  you should consider what it’s costing your entire company. Storage especially is way cheaper in the cloud versus enterprise SANs. 

2. Make sure you are taking into account the cost of the manpower to run it.  And that’s not just their salary (fully loaded with benefits/bonuses), and the proportion of each layer of management going up that has to deal with their concerns (Even if the director only has to spend 30% of his time messing with the data center team, and the VP 10%, and the CTO 5%, and the CEO 1% – that’s a lot of freaking money). It’s also the opportunity cost of having people (smart technical people) doing your plumbing instead of doing things to forward your company.  I would argue that instead of putting in the employee’s salary, you’d do better to put in your revenue per employee!  Why? Because for that same money you could be having someone improving product, making sales, etc.  and making you additional revenue. If all you look at is “cost reduction” you are probably divorced enough from the business goals of your organization that you are not making good decisions. This isn’t to say you don’t need any of that manpower, but ideally with more plumbing being outsourced you can turn their technical skills to something of more productive use. 

3. Make sure you are taking into account the additional lag time and the cost of that time to market delay from DIYing. Some people couch this as just for purposes of innovation – “well, if you’re a small, quick moving, innovative firm or startup, then this velocity matters to you – if you’re a larger enterprise, with yearly budget cycles, not so much.” That’s not true. Assuming you are implementing all this stuff with some end goal in mind, you are burning value along with time the longer it takes you to deliver it – we like to call that cost of delay. Heck, just plain cost of money over that period is significant – I’ve seen companies go through quite a set of gyrations to be able to bill 30 days earlier to get that additional benefit; if you can deliver projects a month earlier from leveraging reusable work (which is all SaaS/PaaS/IaaS are) then you accelerate your cashflow. If you have to wait 12 months for the IT group to get a private cloud working, you are effectively losing the benefit of your deliverable * 12 months.

4. Account for complexity.  The problem with “hybrid cloud,” in most implementations, is that it’s not seamless from on prem to public, and therefore your app architecture has to be doubly complicated.  In a previous position where I ran a large SaaS service, we were spread across AWS (virtual everything) and Rackspace (vserver, F5 LBs, etc.) and it was a total nightmare – we were trying to migrate all the way out to the cloud just so we could delete half of the cruft in all our code that touched the infrastructure – complexity that caused production issues (frequently) and slowed our rate of delivering new functionality. The KISS principle is wrathful when ignored.

I’m not saying hybrid cloud, private cloud, etc. are never the answer – but I would say that on average they are usually not the right answer, and if you are using them as your default approach then it’s better than even money you’re being inefficient. Furthermore, using SaaS and PaaS requires less expertise than IaaS which uses less than private cloud – people justify “starting with private” because you are “leveraging skill sets” or whatever – and then 6 months later you have a whole team still trying to bake off OpenStack vs Eucalyptus when you could have had your app already running in a public PaaS. I’m not sure why I need to say out loud “delivering the most amount of value with the least amount of effort, time, and expenditure is good” – but apparently I do. Just because you *can* do something does not mean you *should* do it.  You need to carefully shepherd your time to delivery and your costs, and not just let things float in a morass of IT because “these things take time…” 

3 Comments

Filed under Cloud

AWS Dying! Rackspace Pulls Out Of Cloud! News At 11!

Boy, it’s been quite a week for the cloud-schadenfreude crowd. If you listen to the various news outlets, apparently Rackspace has given up on cloud and Amazon is in free-fall. Here’s some representative hack jobs pieces:

More accurate are these:

Let’s look at what’s actually going on.

First, Rackspace.  I was on the Spiceworks forum yesterday and the news is definitely being interpreted as “Rackspace is getting out of cloud, don’t consider them any more.” Now, it is their own fault for bungling the messaging here, but if you actually go look at what they are doing, at its heart they are making this change:

Rackspace Cloud will be sold only with a support contract now.

Yes, that’s it.  That’s the change. Now it’s “managed cloud!” Which is fine, a heck a lot of software I buy has mandatory maintenance contracts nowadays, but this doesn’t mean “Rackspace is leaving the cloud business!” They just want to add in their “Fanatical Support ™” to the value proposition and not compete purely on a bare-metal (bare-API?) SaaS “how much does a 2-CPU 4 GB server cost” basis.

Rackspace has to get back out in front of this messaging hard – it’s definitely made its way to the practitioner trenches as “they’re pulling out.” I mean, I have to say Rackspace’s strategy is pretty opaque to most folks, but this message misstep has graduated from “muddled and unclear” to “actively harmful.”

Now, Amazon.  The real story is:

Amazon Web Services only grew 38.39% last quarter.

For a large company that’s a pretty good growth rate, right, is yours higher? The press likes to turn IaaS into a 3 provider horse race. But so far – it’s not. Check out this recent (March 2014) Synergy Research graph.

CIS_Q413_graphic

The fact of the matter is that Amazon is beating the holy hell out of everyone else in IaaS. It’s more neck and neck in PaaS, but sadly the entire PaaS market is still low (due to Joe Average IT Shop basically interpreting PaaS pitches as someone standing up and screaming “I’m a sorcerer!!!”).

IBM, HP, etc. don’t have credible offerings yet.  I know they’re investing, I know they have roped some random companies that love them into doing it, but they are just not there yet. IBM is not a commodity company, they’re a “you have a billion dollar contract with us we’re going to build out whatever we feel like with that.”

Google, same thing. It’s cool, it’s well priced, it’s dev friendly – but at the big price cut announcement, we had a big get-together at Capital Factory here in town. I looked around at the crowd of 40 clouderati types and said “OK, so who is comfortable running production apps on Google cloud yet?” Result: zero. Google’s throwing money at it but as with most of Google’s new offerings, it’s hard to trust it’s not just going to dry up tomorrow and get cancelled because they are running after private spaceships or whatever now, and nothing makes them money like their ad business so “it’s revenue generating” won’t save it. And Google is so bad at enterprise support…

Microsoft Azure was really good. Better than it had a right to be!  I was very impressed with Azure in years 1 and 2. Execution was good (we used it for a SaaS service at National Instruments) and the vision was definitely “where the puck is going to be.” But post-Ozzie, it hasn’t exactly been shaking the sheets. At CloudAustin there was more Azure interest two years ago than there is now. They were going strong on dev friendliness and all, but trying to get into IaaS has been a distraction and they just aren’t keeping pace with Amazon’s rate of new features. Docker support, SSDs, new instances, vCenter integration, Dropbox competitor, desktop-as-a-service Citrix competitor…

Let me address the four big “why AWS is crashing and burning (despite being in an obvious position of market dominance)” points from the “Scorpion” article.

  1. AWS is not the low price provider.
    Eh. Not sure why this is relevant and also not sure it’s true for what you are getting… It’s like saying “there are books cheaper than that book you just bought.” Well sure there are, but do they have the information I want in them? See below for why not always having the lowest cent per minute under Google and Microsoft doesn’t really concern me.
  2. AWS is not the best product at anything – most of their features are mediocre knock offs of other products.
    This misses the point – their features are SIMPLER knockoffs of other products. That’s why it’s an accelerator. Dropbox and Salesforce and all the successful cloud entities have said “you know, some enterprise user left to their own devices is going to generate a list of 1000 requirements they don’t really need. Forget that. Let’s make the actual core functionality they need and leave off the rest so it’ll actually get used.” This is why they dominate the IaaS business. Many of their products are named to match. “SIMPLE email service.” “SIMPLE queue service.” “SIMPLE notification service.” This drives a new wave of architectural thought – instead of complicated services packed with stuff, what if instead I integrate simple, well-designed microservices? After doing a lot of cloud architecture work, those attributes are positives, not negatives.
  3. AWS is unbelievably lousy at support.
    I’m not sure I’d want to be in a race with Amazon, Microsoft, and Google to see who supports customers worse. I’m not sure I’ve ever been part of an enterprise happy with its Google support, and all experiences I’ve had with Microsoft support have been some Brazil-esque “you can’t actually ask them questions, only some VP is a designated contact on the corporate contract…”. Amazon is positioning themselves more like a hardware vendor, you don’t bother getting much support from them besides parts replacement, you get support from the managed hosting provider or whatnot that’s a MSP on top of them if you need it.
  4. Once you are at $200k / month of spend, it’s cheaper and much more effective to build your own infrastructure
    This is frequently untrue and based on people not understanding the full costs of getting stuck in the infrastructure business. What’s your cost of delay? Average enterprise “wait for servers” time is about 6 weeks; assuming you’re not just using them for nothing, your ROI is delayed by that amount. And what about all the operation of those complex systems? You can’t just stick in the salary of the developers and sysadmins you’d need – stick in your revenue per employee instead, because that headcount could be doing something useful for your company instead of plumbing. Not to mention the cascading percentage of each layer of management’s time spent worrying ab out the plumbing and the plumbers instead of conducting the core business of the company. Cost of delay from lost agility and opportunity cost are never taken into account but definitely should be.

I know a lot of the old guard want cloud to dry up and go away, it bothers their lovely datacenters.  And some of the very new guard resent it because Amazon continues to be so successful – they keep up a rate of innovation that new players can’t disrupt. But this whole week of “the cloud is falling” news is complete BS, and won’t amount to much.

Leave a comment

Filed under Cloud

Friday is System Administrator Appreciation Day

Don’t forget, this Friday 7/25 is the annual celebration of System Administrator Appreciation Day.  Start dropping hints to your coworkers about your treat of choice now!  “DevOps means you have to care!(tm)”

Leave a comment

Filed under DevOps

Velocity 2014 After Action Report

An Average Velocity Session

An Average Velocity Session

Well, it was my first Velocity (I’ve been to every one, 2008 to present, you can read the previous reports here on the blog) as a vendor!  So that was different, and I split time between working the Copperegg booth and going to sessions.  As a result I’m not going to do the extensive session-by-session notes I’ve done in the past.  Two other Agile Admins, James and Karthik were there, I’m hoping they do some writeups of sessions they attended too!

Being a vendor was interesting; though standing at the booth made my dogs bark after the day was over, it was great to be able to talk to so many people. There were a lot of monitoring providers at the show (Copperegg (us), Compuware, New Relic, Datadog, many more).  Pingdom was right across from us, with a slate of guys shipped in from Sweden, but they were generally grumpy – jet lag or their recent acquisition, perhaps. A new log management SaaS provider was there, logentries.com, and that was interesting – Sumo is the only real one in the space since Loggly and SplunkStorm borked it up and they’ve been getting a little… “Enterprise-y?” By that I mean having sales reps call you 5x/day and wanting near-Splunk prices.  So yay to the newcomers, competition is always good. Other than that, it was mostly the same slate of Velocity-vendors as usual.

What’s New

Well, let’s get it out of the way – there wasn’t all that much new this year. Karthik complained to me that “last year, Velocity was my favorite conference ever, and this year I didn’t get much out of it.” Not every year hosts a bunch of new techniques, sadly, but I thought there were some gems in there.  Here’s the major four new trends taking up speech-space:

Docker docker docker containers containers containers. Learn it now because in a year everything will be in containers – no, seriously. Largest splash in computing since Amazon AWS. The hype is a little overexcited at times but there’s a lot of new development going on here.  On the one hand, not everyone needs new-box spinup in 5s instead of 5m and the efficiency gains are a tradeoff for security – but to be blunt, people stopped well short of exercising the elasticity and ephemerality of cloud/virtualization solutions, instead going for the more comfortable “let’s deploy a three tier app manually like we did back in the day, but in the cloud” and so containers will be a disruption to push forward the concept of dynamic service orchestration etc., which is good.

There is starting to be buzz around Internet of Things.  Mark Burgess (CFEngine, author of “In Search Of Certainty”) did a presentation on IoT and a more distributed model of monitoring and computation. Worth looking at, and it’s becoming more a part of mainstream computing (“engineering” tech and “IT” tech split off from each other 15 years ago for whatever reason and are just now joining forces again). Since we Agile Admins all had worked at National Instruments and had tried to get them onto the IoT bandwagon like 5 years ago, we grumped among each other about this.

There’s also strong interest in software defined networking (OpenDaylight, Cumulus). John Willis (@botchagalupe) waxed poetic on the topic and it fit into the general push towards making everything programmable.

There was strong and sustained interest (presentations, etc.) on STEM education and specifically on women in tech/getting more women into tech.

Keynotes

My Room At The Avatar

My Room At The Avatar

Video of these should be publicly available so you can watch them.

Jeff Dean of Google did a very interesting talk on making large scale services low latency that I recommend everyone view (video is at the link). Shared environments increase utilization but also congestion, exacerbated by large fanout systems – if a given system has services with only 1% 1 sec latency and have you to touch 100 services to finish your call, 63% of calls take more than a second. Traditional latency reduction uses techniques like differentiated service classes, breaking up large requests, managing background activity (rate limit, wait till low load). Tolerating faults is a lot like tolerating variability – extra resources make your system reliable – do the same with variability, but much lower timeframe. There’s two ways to do that…

  1. Cross Request Adaptation – examine recent behavior and make changes (load balance, scale) – low timescale, this tends to make the “next call” faster. Fine grained dynamic partitioning relies on equal sizes and constant load, but if you break up into 10-100 things a machine you can shed load more effectively. Selective replication, like in query system they make more copies of important docs. Use latency-induced probation via your load balancer, offload to other boxes, shadow stream to original, return to service when it’s better.
  2. Within Request Adaptation – make the call faster within the single call! Basically this is a series of refinements on “send the request two places.”  First he modeled sending the request again to another server if it didn’t return in an expected amount of time. You can get cuter, like by always sending to two destinations and having the one that starts working on it give a sideways “I’ve got it” to the other. His mathematical analysis says that you can cut latency dramatically for a very small increase in load, and not only that, but the response of a loaded cluster and an idle cluster become very similar (less dramatic spiking under load).

And I did one!  Just a 5 minute spot since Copperegg was a platinum sponsor; I talked about applying a Lean approach to implementing monitoring. It was called A 5 Minute Checklist For Application Monitoring and slides/video are at the link.  I also wrote a white paper to expand on it that’s available for download here.

Sessions

California Sushi

California Sushi

I went to a number of sessions that I enjoyed; here’s a quick breakdown of the ones I thought were winners.  I’ll try to find slides and link them where they exist. O’Reilly charges for the videos though.

Vladimir Vuskan’s workshop on ganglia. People like the gathering of mass metrics. They did rake him over the coals a bit on the 15s time resolution and the relatively primitive RRDTool graphs.  He had some interesting bits like a “check that a value is the same everywhere” alert for consistency. He also summed up “why we monitor” well – MTTD, MTTR, trending, learning.

Theo Schlossnagle’s presentation on Understanding Slowness. He recommended a system map as step 1 – high level box and line but low level with all versions, locations, and service connections. He also talked about going to histograms but less sophisticated users find those hard to understand, so displaying quantiles can be a happy medium. He sees three different tool spaces: observational, synthetic, and manipulation.

There was a good presentation on the math around false alarms, using the “sensitivity” and “specificity” terms from medicine. Here’s a quick reference on those and how you calculate a positive predictive value. Undetected outages are embarrassing so the response is to narrow the monitoring thresholds but this just generates more false alerts, aka “pagerrhea.” This segued into the discussion of using better means to detect deviation – hysteresis, moving thresholds like Holt-Winters, cross-correlation of metrics, Fourier transforms. You should alert on whether work is getting done, not on CPU or swap but on HTTP response time and requests per second. He wants “something like nagios but that separates detection from diagnosis.”

I also really appreciated the LinkedIn talk on technical debt. They admitted that several years ago, they were trying to keep up in the social world and just ground to a halt because of accumulated technical debt. They had to stop and take a bunch of time to fix it before they could move forward. Important takeaways included:

  • Technical debt comes small decision by small decision
  • Don’t wait for version n+1, fix it now
  • “One in a million” problems happen a lot at web scale
  • Cancerous workarounds are no good
  • Broken window syndrome – if things are broken, people will tend to leave things broken
  • Zombie tech will eat you
  • Use our cool rest.li REST framework!
  • Employee engagement drains KPIs
  • Strategies – recognize debt choices and decisions
  • Use new eyes – consultants, interns – to identify the “bad parts”
  • Measure the right things
  • Technical debt you can see is only the tip of the iceberg
  • Make active decisions otherwise in Soviet Russia, Decision Makes You!  (well, I added that last part)

The last really good one was about confirmation bias and monitoring. When dealing with metrics there are a lot of cognitive illusions – the anchoring effect (whatever it was recently before it deviated must have been right), the validity effect (a couple people told me that so it must be true), illusory correlation (looks like those happened around the same time), attitude polarization (round up the usual suspects). The way to combat this is with analysis. Rethink your data flow, validate your stats.  Use anomaly detection like the open sourced skyline and oculus to really detect correlations and deviations.

Though there weren’t as many breakthroughs this year, I appreciated the incremental uptick in wisdom about how to use what we have!

Social

Much of the benefit of conferences isn’t the sessions, it’s the great people you meet and share experiences with. Once you’ve been a couple years, you get to see old friends – though sadly none of our compatriots from Agile Admin alumni companies were there (National Instruments, Bazaarvoice, PowerReviews) we did get to see most of the “usual suspects” we get to see at these shows – we had the usual “hang out at the Hyatt bar fiesta” with Andrew Schafer, John Willis, Ben Rockwood, Cameron Haight and Jonah Kowall from Gartner, Gene Kim, and many more.  Notable in his absence was Patrick Debois who remained in Belgium, we all missed him.

If you went to Velocity this year, chime in below (especially if we met you there!).

1 Comment

Filed under Conferences, DevOps

An article I wrote for InfoWorld’s New Tech Forum on all the various monitoring techniques: Know your options for infrastructure monitoring

Leave a comment

by | July 3, 2014 · 2:43 pm

DevOpsDays Silicon Valley 2014 Day Two Notes

Day Two

spam First, a public shaming. Some goober from Merantis left this spam on everyone’s car. Bah.

Presentations

@bridgetkromhout spoke on “how I learned to stop worrying and love devops” and @benzobot spoke on onboarding and mentoring apprentices. DoD SV certainly made a strong effort to get more female speakers this year!  We tried in Austin (I personally wrote like every local techie woman group I could find) but we only had like one.

Then there were two super bad ass presentations back to back. I can’t find the slides online yet.

The Future of Configuration Management

Mark Burgess (@markburgess_osl), aka “The CFEngine Guy” and noted Promise Theory advocate, spoke.  Chef and Puppet had eclipsed CFEngine for a while but it turns out as the Internet of Things and containers and stuff are arriving that maybe many of his design decisions were actually prescient and not retro. Here it is broken down into wise sayings.

  • Why do we not have CAD for IT systems?
  • Orchestration is not bricklaying.
  • We need the equivalent of style sheets for servers.
  • We are entering a world of decentralized smart infrastructure.
  • Scale, complexity, and knowledge increase as our desire for flexibility increases.
  • Separation of concerns adds complexity and fragility.
  • To handle complexity – atomize and untether.
  • 3D printed datacenters are coming.

DevOps as Relationship Management

James Urquhart (@jamesurquhart) spoke about the interconnectedness of our systems. The SEC, post flash crash, added circuit breakers, defined rollback protocols, inserted agents into the flow of the stock exchange trading systems to prevent uncontrolled cascading.

One simple rule – visualize the whole system (monitor your outside relationships) but take action at the agent level. “How are you doing today?” “Good.” Monitoring is going well, new approaches in the space look at policies and interactions and performance and business medtrics – but need to differentiate reductionist vs expansionist approaches.

Michal Nygard’s book Release It! is full of great patterns, and Netflix’ open sourced Hystrix is an example of the kind of relational system safeguards you can build off it.

Ignite Block

  • Tips for Introverts (at Conventions) by Tom Duffield – They include find a role, don’t fear failure, attend preconference activities, go to lunch early and sit, engage, share interests, find a comfortable setting, take time to recharge. As someone initially introverted myself (no one believes that now) I like that this has actual tips to get past it; in some circles “introversion” has become the new “Asperger’s” as a blanket excuse for not wanting to bother to relate to people.
  • Mike Place on scalable container management – Google kubernetes is an example. Don’t just provision your systems, you need to manage them too. Images came and went and came back now, but you also can’t ignore what’s onboard the image. It’s time to join image and config management.
    This was really good and the world should listen. On the one hand, conducting CM operations on 1000 servers in parallel is contributing unnecessarily to the heat death of the universe.  On the other hand, you need to build those images in a non-manual way in the first place! And too many systems worry about the configuration but not the runtime operation. Amen brother!
  • Finally (well, there were two more, but I didn’t care for them so took no notes), John Willis (@botchagalupe) did [Darwin to] Deming to DevOps, a burst-fire reading list of nondeterminism tracing from Darwin through various scientists to the Deming/TPS stuff through into the DevOps world with Gene Kim and Patrick Debois.  It was pimp. Here it is when he gave it at another venue:

Conclusions

Here’s some big themes from the week.

  • Deterministic, reductionist, and centralized are for suckers.
  • Complexity is the enemy.  Systems thinking is necessary.
  • We love continuous deployment.  But DevOps is not just about delivering code to production.
  • Women exist in DevOps and are cool.  More would be great.
  • Most vendors have figured out to just relax and talk to techies in a way they might listen to.  Some haven’t.

It was a great event, kudos to Marius and the other organizers who put in a lot of work to wrangle 500 people, nearly 30 sponsors, food, venue, and the like.  If you haven’t been to a DevOpsDays, look around, there may be one near you!  I help organize DevOpsDays Austin (just had our third annual) and there’s ones coming this year from Tel Aviv to Minneapolis.

If you went to DoD SV, feel free and comment below with your thoughts (linking any posts you’ve made, slides, etc. is welcome too)!

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Silicon Valley 2014 Day One Notes

I have more notes from Velocity, but thought I’d do DevOpsDays first while it’s freshest in my brain. This isn’t a complete report, it’s just my thoughts on the parts I felt moved to actually write down or gave me a notable thought. More notes when I was learning, less when I wasn’t (not a reflection on the quality of the talk, just some things I already knew a bit about).

DevOpsDays Silicon Valley 2014 was June 27-28 at the San Jose Computer Museum. 500 people registered; not sure how many showed but I’d guess definitely in excess of 400.

Day One

State of the Union

First we had John Willis (@botchagalupe) giving the DevOps State of the Union. Here’s the slides (I know it says Amsterdam, he gave it there too.) This consisted of two parts – the first was a review of Gene Kim et al’s 2014 State of DevOps Report – go download it if you haven’t read it, it’s great stuff.

The second part is about how we are moving towards software defined everything – robust API driven abstractions decoupled from the underlying infrastructure. John’s really into software defined networking right now as it’s one of the remaining strongholds of static-suckiness in most infrastructures. A shout out to the blog at networkstatic.net and tools like mesos and Google’s kubernetes that are making computing even more fluid (see this article for some basics). “Consumable, composable infrastructure.”

Saying No

Next, our favorite Kanban expert Dominica DeGrandis (@dominicad) spoke on “Why Don’t We Just Say No?”  Here’s the slides. As a new product manager, and as a former engineering manager who had engineers that would just take on work till they burst even with me standing there yelling “No!  Don’t do it!”, it’s an interesting topic.

Why do you take on more work than you have capacity to do? She cites The Book Of No by Susan Newman, Ph. D and a very recent Psychology Today “Caveman Logic” post called Why So Many People Just Can’t Say No. She proposes that it is easier for devs to say no; ops have more pressing demands and are forced into too much yes. Some devs took exception to this on twitter – “our product people make us do all kinds of stuff we don’t like to” – but I think that’s different from the main point here. It’s not that “you have to do something you don’t like and are overruled when you say no” but that “you become severely over-committed due to requests from many quarters and being unwilling to say no.”

She goes through a great case study of changing over a big ops shop to a more modern “SRE” model and handle both interrupt and project work by getting metrics, having a lower WIP limit, closing out >90 day old tickets, and saying no to non-emergency last minute requests. In fact, the latter is why I prefer scrum over kanban for operations so far – she contended that devs have an easier time saying no to interrupt work because of the sprint cadence.  OK, so adopt a sprint cadence! Anyway, by having some clear definitions of done for workflow stages they managed to improve the state of things considerably. Use kaizen.  The book about the Pixar story, Creativity Inc., talks about how the Pixar folks were running themselves ragged to try to finish Toy Story, till someone left their baby in a car because they were too frazzled. “Asking this much of people, even when they wanted to give it, was not acceptable.” What should your WIP level be? The level of “personal safety” would be a great start!

It’s interesting – I did some of these things at Bazaarvoice and tried to do some other ones too. But often times the resistance would be from the engineers that the current process was working to death.  “We can’t close those old tickets!  They have valuable info and analysis and it’s something that needs to happen!” “Yes, but our rate of work done and rate of work intake proves mathematically that they’ll never get done. Keeping them open is therefore us making a false promise to whoever logged those tickets.” Not everyone is able to ruthlessly apply logic to problems – you’d think that would be an engineer attribute but in my experience, not really any more than the general population. But given that “not acceptable” quote above, I really struggled with how to get engineers who were burning themselves out to quit it.  It’s harder than you’d think.

Agile at Scale

Next was a fascinating case study from Capital One’s transformation to an agile, BDD, devops-driven environment given by Adam Auerbach (@bugman31). The slides are available on Slideshare.  They used the Scaled Agile Framework (SAFe) and BDD/acceptance test driven development with cucumber as well as continuous integration. In a later openspace there were people from Amex, city/state/federal governments, etc. trying to do the same thing – Agile and DevOps aren’t just for the little startups any more! He reported that it really improved their quality. Hmm, from the Googles it looks like the consulting firm LitheSpeed was involved, I met one of their principals at Agile Austin and he really impressed me.

Sales and Marketing Too

Sarah Goff-DuPont (@devtoolsuperfan) spoke about having sales and marketing join the agile teams as well.  Some tips included cross-pollinating metrics and joining forces on customer outreach.

Ignites

Just some quick thoughts from the day one Ignites.

  • @eriksowa on OODA and front end ops and screaming at your team in German (I am in favor of it)
  • Aater (futurechips) on data acquisition and multitenancy with docker
  • Jason Walker on LegoOps
  • Ho Ming Li on Introducing DevOps
  • @seemaj from Enstratius on classic to continuous delivery – slides. Pretty meaty with lots of tool shout-outs – grunt, bowler, angular, yo, bootstrap, grails, chef, rundeck, hubot, etc. I don’t mind a good laundry list of things to go find out more about!
  • Matt Ho on Docker+serf – with Docker there is a service lookup challenge. AWS tagging is a nice solution to that. Serf does that with docker like a peer-to-peer zookeeper. Then he used moustache to generate configs. This is worth looking at – I am a big fan of this approach (we did it ourselves at National Instruments years ago) and I frankly think it’s a crime that the rest of the industry hasn’t woken up to it yet.

Openspaces

If you haven’t done openspaces before, it’s where attendees pitch topics and the group self-organizes into a schedule around them. Here’s some pics of part of the resulting schedule:

openspaces3 openspaces1 openspaces2

I went to two.  The first was a combination of two openspace pitches, “Enterprise DevOps” and “ITIL, what should it be?”  This was unfortunately a bad combination.  Most folks wanted to talk about the former, and the Capital One guy was there and people from Amex etc. were starting to share with the group. But the ITIL question was mostly driven by a guy from the company that “bought ITIL” from the UK government and he had a bit of a vendory agenda to push.  So most of the good discussion there happened between smaller groups after it broke up.

The second was a CI/CD pipeline one, and I got this great pic of what people consider to be “the new standard” pipeline.

Generally Accepted Continuous Integration and Delivery Pipeline

Generally Accepted Continuous Integration and Delivery Pipeline

Next, Day 2 and wrap-up!

Leave a comment

Filed under Conferences, DevOps