DevOpsDays Silicon Valley 2014 Day One Notes

I have more notes from Velocity, but thought I’d do DevOpsDays first while it’s freshest in my brain. This isn’t a complete report, it’s just my thoughts on the parts I felt moved to actually write down or gave me a notable thought. More notes when I was learning, less when I wasn’t (not a reflection on the quality of the talk, just some things I already knew a bit about).

DevOpsDays Silicon Valley 2014 was June 27-28 at the San Jose Computer Museum. 500 people registered; not sure how many showed but I’d guess definitely in excess of 400.

Day One

State of the Union

First we had John Willis (@botchagalupe) giving the DevOps State of the Union. Here’s the slides (I know it says Amsterdam, he gave it there too.) This consisted of two parts – the first was a review of Gene Kim et al’s 2014 State of DevOps Report – go download it if you haven’t read it, it’s great stuff.

The second part is about how we are moving towards software defined everything – robust API driven abstractions decoupled from the underlying infrastructure. John’s really into software defined networking right now as it’s one of the remaining strongholds of static-suckiness in most infrastructures. A shout out to the blog at networkstatic.net and tools like mesos and Google’s kubernetes that are making computing even more fluid (see this article for some basics). “Consumable, composable infrastructure.”

Saying No

Next, our favorite Kanban expert Dominica DeGrandis (@dominicad) spoke on “Why Don’t We Just Say No?”  Here’s the slides. As a new product manager, and as a former engineering manager who had engineers that would just take on work till they burst even with me standing there yelling “No!  Don’t do it!”, it’s an interesting topic.

Why do you take on more work than you have capacity to do? She cites The Book Of No by Susan Newman, Ph. D and a very recent Psychology Today “Caveman Logic” post called Why So Many People Just Can’t Say No. She proposes that it is easier for devs to say no; ops have more pressing demands and are forced into too much yes. Some devs took exception to this on twitter – “our product people make us do all kinds of stuff we don’t like to” – but I think that’s different from the main point here. It’s not that “you have to do something you don’t like and are overruled when you say no” but that “you become severely over-committed due to requests from many quarters and being unwilling to say no.”

She goes through a great case study of changing over a big ops shop to a more modern “SRE” model and handle both interrupt and project work by getting metrics, having a lower WIP limit, closing out >90 day old tickets, and saying no to non-emergency last minute requests. In fact, the latter is why I prefer scrum over kanban for operations so far – she contended that devs have an easier time saying no to interrupt work because of the sprint cadence.  OK, so adopt a sprint cadence! Anyway, by having some clear definitions of done for workflow stages they managed to improve the state of things considerably. Use kaizen.  The book about the Pixar story, Creativity Inc., talks about how the Pixar folks were running themselves ragged to try to finish Toy Story, till someone left their baby in a car because they were too frazzled. “Asking this much of people, even when they wanted to give it, was not acceptable.” What should your WIP level be? The level of “personal safety” would be a great start!

It’s interesting – I did some of these things at Bazaarvoice and tried to do some other ones too. But often times the resistance would be from the engineers that the current process was working to death.  “We can’t close those old tickets!  They have valuable info and analysis and it’s something that needs to happen!” “Yes, but our rate of work done and rate of work intake proves mathematically that they’ll never get done. Keeping them open is therefore us making a false promise to whoever logged those tickets.” Not everyone is able to ruthlessly apply logic to problems – you’d think that would be an engineer attribute but in my experience, not really any more than the general population. But given that “not acceptable” quote above, I really struggled with how to get engineers who were burning themselves out to quit it.  It’s harder than you’d think.

Agile at Scale

Next was a fascinating case study from Capital One’s transformation to an agile, BDD, devops-driven environment given by Adam Auerbach (@bugman31). The slides are available on Slideshare.  They used the Scaled Agile Framework (SAFe) and BDD/acceptance test driven development with cucumber as well as continuous integration. In a later openspace there were people from Amex, city/state/federal governments, etc. trying to do the same thing – Agile and DevOps aren’t just for the little startups any more! He reported that it really improved their quality. Hmm, from the Googles it looks like the consulting firm LitheSpeed was involved, I met one of their principals at Agile Austin and he really impressed me.

Sales and Marketing Too

Sarah Goff-DuPont (@devtoolsuperfan) spoke about having sales and marketing join the agile teams as well.  Some tips included cross-pollinating metrics and joining forces on customer outreach.

Ignites

Just some quick thoughts from the day one Ignites.

  • @eriksowa on OODA and front end ops and screaming at your team in German (I am in favor of it)
  • Aater (futurechips) on data acquisition and multitenancy with docker
  • Jason Walker on LegoOps
  • Ho Ming Li on Introducing DevOps
  • @seemaj from Enstratius on classic to continuous delivery – slides. Pretty meaty with lots of tool shout-outs – grunt, bowler, angular, yo, bootstrap, grails, chef, rundeck, hubot, etc. I don’t mind a good laundry list of things to go find out more about!
  • Matt Ho on Docker+serf – with Docker there is a service lookup challenge. AWS tagging is a nice solution to that. Serf does that with docker like a peer-to-peer zookeeper. Then he used moustache to generate configs. This is worth looking at – I am a big fan of this approach (we did it ourselves at National Instruments years ago) and I frankly think it’s a crime that the rest of the industry hasn’t woken up to it yet.

Openspaces

If you haven’t done openspaces before, it’s where attendees pitch topics and the group self-organizes into a schedule around them. Here’s some pics of part of the resulting schedule:

openspaces3 openspaces1 openspaces2

I went to two.  The first was a combination of two openspace pitches, “Enterprise DevOps” and “ITIL, what should it be?”  This was unfortunately a bad combination.  Most folks wanted to talk about the former, and the Capital One guy was there and people from Amex etc. were starting to share with the group. But the ITIL question was mostly driven by a guy from the company that “bought ITIL” from the UK government and he had a bit of a vendory agenda to push.  So most of the good discussion there happened between smaller groups after it broke up.

The second was a CI/CD pipeline one, and I got this great pic of what people consider to be “the new standard” pipeline.

Generally Accepted Continuous Integration and Delivery Pipeline

Generally Accepted Continuous Integration and Delivery Pipeline

Next, Day 2 and wrap-up!

Leave a comment

Filed under Conferences, DevOps

Opening Velocity 2014

All right, I’m here in sunny San Jose for Velocity, the three-day Web operations and performance conference.  It’s my first time attending as a sponsor type which is interesting. We have a whole cadre going; I flew in with Jenny and Lauren from Copperegg as part of the advance squad.  Because I just got in on this gig recently, I am out at the Avatar while they’re at the Hilton nearby.  On the cab ride, they got a bit agitated over a tweet claiming we’re being exclusionary over our “The Dude” promos; I guess I can see the misunderstanding but it’s a Big Lebowski theme specifically cooked up by the women in our Marketing department.

Some IHOP breakfast, a long walk from the Avatar to the convention center, and then speaker checking, where I got to chat with Mandy Walls, Vladimir Vuskan, and Andrew “Clay” Shafer.  Apparently there’s a two person limit on booth setup so I don’t have to help with that. I’ll go report on Andrew’s talk, though will have to duck out early for speaker orientation for my talk.

Remember, if you can’t make it they’ll be streaming the morning keynotes on Wed/Thurs.  If you are here, grab me and say “Hi!”

Leave a comment

Filed under Conferences

Meet The Agile Admins At Velocity/DevOpsDays Silicon Valley!

Three of the four agile admins (James, Karthik, and myself) will be out at Velocity and DevOpsDays this week. Say hi if you see us!

James will be doing a workshop with Gareth Rushgrove on Tuesday 9-10:30 AM, “Battle-tested Code without the Battle – Security Testing and Continuous Integration.” Get hands on with gauntlt and other tools! [Conference site] [Lanyrd]

Ernest is doing a 5 minute sponsor keynote on Thursday, “A 5 Minute Checklist for Application Monitoring.” OK, so it’s during the USA vs Germany game – come see me anyway!  I hate keynote sales pitches so I’m not doing one, I’ll be talking about a Lean approach to monitoring and stuff to cover in your MVP. There’s a free white paper too since what can you really say in 5 minutes? And so you know what to expect, the hashtag you’ll want to use is #getprobed! [Conference site] [Lanyrd]

Leave a comment

Filed under Conferences

Monitoring and the State of DevOps

If you haven’t read the new  2014 State of DevOps Report from Puppet Labs and other luminaries, check it out now!

I also pulled out some of their findings on monitoring to inspire a post for the Copperegg blog, Monitoring and the State of DevOps, which I thought I’d mention here too.

Leave a comment

Filed under DevOps, Monitoring

Filtering Your Datadog Event Stream

At both NI and Bazaarvoice I was a Datadog user; I wrote a piece for them on filtering the event stream that has just been published on the Datadog blog.  Check it out!

Leave a comment

Filed under DevOps

Agile Organization: Separate Teams By Discipline

This is the first in the series of deeper-dive articles that are part of Agile Organization Incorporating Various Disciplines. It’s very easy to keep reorganizing and trying different models without actually learning from the process. I’ve worked with all of these so am trying to condense the pros and cons to help people understand the implications of the type of organizational model they choose.

The Separate Team By Discipline Model

Separate teams striated by discipline is the traditional method of organizing technical teams – segmented horizontally by technical skill.  You have one or more development teams, one or more operations teams, one or more QA teams.  In larger shops you have even more horizontal subdivisions – like in an enterprise IT shop, under the banner of Infrastructure you might have a data center team, a UNIX admin team, a SAN team, a Windows admin team, a networking team, a DBA team, a telecom team, applications administration team(s), and so on. It’s more unusual to have the dev side specifically segmented horizontally by tech as well (“Java programmers,” “COBOL programmers,” “Javascript programmers”) but not unheard of; it is more commonly seen as “UX team, services team, backend team…”

separateteamsIn this setup in its purest form, each team takes on tasks for a given product or project inside their team, works on them, and either returns them to the requester or passes them through to yet another team. Team members are not dedicated to that product or effort in any way except inasmuch as the hours they spend working on the to-do(s). Usually this is manifested as a waterfall approach, as a product or feature is conceived, handed to developers to develop, handed to QA to test, and finally handed to Operations to deploy and maintain.

This model dates back to the mainframe days, where it works pretty well – you’re not innovating on the infrastructure side, the building’s been built, you’re moving your apps into the pre-built apartment units. It also works OK when you have heavy regulation requirements or are constrained to extensively documenting requirements, then design, etc. (government contracts, for example).

It works a lot less well when you need to move quickly or need any kind of change or innovation from the other teams to achieve your goal. Linking up prioritization across teams is always hard but that’s the least of the issues. Teams all have their own goals, their own cadences, and even their own cultures and languages. The oft-repeated warning that “devs are motivated to make changes and ops is motivated by system stability” is a trivial example of this mismatch of goals. If the shared teams are supporting a limited number of products it can work. When there are competing priorities, I’ve seen it be extremely painful.  I worked in a shop where the multiple separate dev teams were vertical (line of business organized) but the operations teams were horizontal (technical specialty organized) – and frankly, the results of trying to produce results with the impedance mismatch generated by that setup was the nightmare that sent me down the Agile and DevOps path initially.

Benefits of Disciplinary Teams

The primary benefit of this approach is that you tend to get stable teams of like individuals, which allows them to bond with those of similar skills and experience organizational stability and esprit de corps. Providing this sense of comfort ends up being the key challenge of the other organizational approaches.

The second benefit is that it provides a good degree of standardization across products – if one ops team is creating the infrastructure for various applications, then there will be some efficiencies there.  This is almost always at least partially, and sometimes more than entirely, counteracted in value by the fact that not all apps need the same thing and that centralized teams bottleneck delivery and reduce velocity. I remember the UNIX team that would only provide my Web team expensive servers, even though we were highly horizontally scaled and told them we’d rather have twice as many $3500 servers instead of half as many $7000 servers as it would serve uptime, performance, etc. much better. But progress on our product was offered up upon the altar of nominal cost savings from homogeneity.

The third benefit is that if the horizontal teams are correctly cross-trained, it is easier avoid having single points of failure; by collecting the workers skilled in something into one group, losses are more easily picked up by others in the group. I have to say though, that this benefit is often more honored in the breach in my experience – teams tend to naturally divide up until there’s one expert on each thing and managers who actively maintain a portfolio and drive crosstraining are sadly rare.

Drawbacks of Disciplinary Teams

Conway’s Law is usually invoked to worry about vertical divisions in a product – one part of the UI written by one team, another by another, such that it looks like a Frankenstein’s monster of a product to the end user. However, the principle applies to horizontal divisions as well – these produce more of a Human Centipede, with the issue of one phase becoming the input of the next. The front end may not show any clear sign of division, but the seams in quality, reliability, and agility of the system can grow like a cancer underneath, which users certainly discover over time.

This approach promotes a host of bad behaviors. Pushing work to other people is always tempting, as is taking shortcuts if the results of those shortcuts fall on another’s shoulders. With no end to end ownership of a product, you get finger pointing and no one taking responsibility for driving the excellence of the service – and without an overall system thinking perspective, attempts at one of the teams in that value chain to drive an improvement in their domain often has unintended effects on the other teams in that chain that may or may not result in overall improvement. If engineers don’t eat their own dog food, but pass it on to someone else, then chronic quality problems often result. I personally spent years trying to build process and/or relationships to try to mitigate the dev->QA->ops passing of issues downstream with only mixed success.

Another way of stating this is that shared services teams always provide a route to the tragedy of the commons. Competing demands from multiple customers and the need for “nonfunctional” requirements (performance, availability, etc.) could potentially be all reconciled in priority via a strong product organization – but in my experience this is uncommon; product orgs tend to not care about prioritization of back end concerns and are more feature driven. Most product orgs I have dealt with have been more or less resistant to taking on platform teams, managing nonfunctional requirements, and otherwise interacting with that half of demands on the product. Without consistent prioritization, shared teams then become the focus of a lot of lobbying by all their internal customers trying to get resource. These teams are frequently understaffed and thus a bottleneck to overall velocity.

Ironically, in some cases this can be beneficial – technically, focusing on cost efficiency over delivering new value is always a losing game, but some organizations are self-unaware enough that they have teams continuing to churn out “stuff” without real ROI associated (our team exists therefore we must make more), in which case a bottleneck is actually helpful.

Mitigations for the weaknesses of this approach (abdication of responsibility and bottlenecking constraints) include:

  1. Very strong process guidance. “If only every process interface is 100% defined, then this will work”, the theory goes, just as it works on a manufacturing line.  Most software creation, however, is not similar to piecing components together to make an iPod. In one shop we worked for years on making a system development process that was up to this task, but it was an elusive goal. This is how, for example, Microsoft makes the various Office products look the same in partial defiance of Conway’s Law – books and books of standards.
  2. Individuals on shared teams with functional team affinities. Though not going as far as embedding into a product team, you can have people in the shared teams who are the designated reps for various client teams. Again, this works better when there is a few-to-one instead of a many-to-one relationship.  I had an ops team try this, but it was a many-to-one environment and each individual engineer ended up with three different ownership areas, which was overwhelming. In addition, you have to be careful not to simply dedicate all of one sort of work to one person, as you then create many single points of failure.
  3. Org variation: Add additional crossfunctional teams that try to bridge the gap.  At one place I worked, the organization had accepted that trying to have the systems needs of their Web site fulfilled by six separate infrastructure teams was not working well, so they created a “Web systems” team designed to sit astride those, take primary responsibility, and then broker needs to the other infrastructure teams. This was an improvement, and led to the addition of a parallel team responsible for internal apps, but never really got to the level of being highly effective. In addition those were extremely high-stress roles, as they bore responsibility but not control of all the results.

Conclusion

Though this is the most typical organization of technology teams historically, that history comes from a place much different than many situations we find ourselves in today. The rapid collaboration approach that Agile has brought us, and the additional understanding that Lean has given us in the software space, tells us that though this approach has its merits it is much overused and other approaches may be more effective, especially for product development.

Next, we’ll look at embedded crossfunctional service teams!

2 Comments

Filed under Agile, DevOps

Use Gauntlt to test for Heartbleed

Heartbleed is making headlines and everyone is making a mad dash to patch and rebuild. Good, you should. This is definitely a nightmare scenario but instead of using more superlatives to scare you, I thought it would be good to provide a pragmatic approach to test and detect the issue.

@FiloSottile wrote a tool in Go to check for the Heartbleed vulnerability. It was provided as a website in addition to a tool, but when I tried to use the site, it seemed over capacity. Probably because we are all rushing to find out if our systems are vulnerable. To get around this, you can build the tool locally from source using the install instructions on the repo. You need Go installed and the GOPATH environment variable set.

go get github.com/FiloSottile/Heartbleed
go install github.com/FiloSottile/Heartbleed

Once it is installed, you can easily check to see if your site is vulnerable.
Heartbleed example.com:443

Cool! But, lets do one better and implement this as a gauntlt attack so that we can make sure we don’t have regressions and so that we can automate this a bit further. Gauntlt is a rugged testing framework that I helped create. The main goal for gauntlt is to facilitate security testing early in the development lifecycle. It does so by wrapping security tools with sane defaults and uses Gherkin (Given, When, Then) syntax so it easily understood by dev, security and ops groups.

In the latest version of gauntlt (gauntlt 1.0.9) there is support for Heartbleed–it should be noted that gauntlt doesn’t install tools, so you will still have to follow the steps above if you want the gauntlt attacks to work. Lets check for Heartbleed using gauntlt.

gem install gauntlt
gauntlt --version

You should see 1.0.9. Now lets write a gauntlt attack. Create a text file called heartbleed.attack and add the following contents:

@slow
Feature: Test for the Heartbleed vulnerability

Scenario: Test my website for the Heartbleed vulnerability (see heartbleed.com for more info)

Given "Heartbleed" is installed
And the following profile:
| name | value |
| domain | example.com |
When I launch a "Heartbleed" attack with:
"""
Heartbleed <domain>:443
"""
Then the output should contain "SAFE"

You now have a working gauntlt attack that can be hooked into your CI/CD pipeline that will test for Heartbleed. To see this example attack file on github, go to https://github.com/gauntlt/gauntlt/blob/master/examples/heartbleed/heartbleed.attack.

To run the attack

$ gauntlt ./heartbleed.attack

You should see output like this
$ gauntlt ./examples/heartbleed/heartbleed.attack
Using the default profile...
@slow
Feature: Test for the Heartbleed vulnerability

Scenario: Test my website for the Heartbleed vulnerability (see heartbleed.com for more info) # ./examples/heartbleed/heartbleed.attack:4
Given "Heartbleed" is installed # lib/gauntlt/attack_adapters/heartbleed.rb:4
And the following profile: # lib/gauntlt/attack_adapters/gauntlt.rb:9
| name | value |
| domain | example.com |
When I launch a "Heartbleed" attack with: # lib/gauntlt/attack_adapters/heartbleed.rb:1
"""
Heartbleed <domain>:443
"""
Then the output should contain "SAFE" # aruba-0.5.4/lib/aruba/cucumber.rb:131

1 scenario (1 passed)
4 steps (4 passed)
0m3.223s

Good luck! Let me (@wickett) know if you have any problems.

1 Comment

Filed under DevOps, Security