Tag Archives: DevOps

by Ernest Mueller | March 17, 2014 · 6:50 pm

DevOpsDays Austin Is Coming!

The third annual DevOpsDays conference in Austin will be May 5-6 (Cinco de Mayo!) at the Marchesa, where it was held last year! As many of you know, the DevOpsDays conferences are a super popular format – half talks from practitioners, half openspaces, all fun – held in many cities around the world since the first one in Ghent launched the DevOps movement proper.

You can register – all the early bird tickets are sold out but the regular ones are only half gone.
You can also propose a talk! There’s 35-minute full talk slots but we’re even more in need of 5-minute Ignite! style lightning talks! RFP ends 3/26 sp
You can sponsor! The Gold sponsorships are half gone already. And we have some special options this year…

DevOpsDays Austin has been bigger and better every year since its inception and should have something good for everyone this year. Come out and join your comrades from the trenches who are trying to forge a new way of delivering and maintaining software!

1 Comment

Filed under Conferences, DevOps

Tagged as austin, conference, DevOps, devopsdays

Link

Agile Austin asked me to help re-launch their blog, so I’ve contributed a piece on “What Is DevOps?” for them!

Leave a comment

by Ernest Mueller | March 16, 2014 · 9:58 am

by Ernest Mueller | March 12, 2014 · 12:29 pm

Agile Organization Incorporating Various Disciplines

I started thinking about this recently because there was an Agile Austin QA SIG meeting that I sadly couldn’t attend entitled “How does a QA manager fit into an Agile organization?” which wondered about how to fit members of other disciplines (in this case QA) in with agile teams. Over the last couple years I’ve tried this, and seen it tried, in several ways with DevOps, QA, Product Management, and other disciplines, and I thought I’d elaborate on the pros and cons of some of these approaches.

Two Fundamental Discoveries

There are two things I’ve learned from this process that are pretty universal in terms of their truth.

1. Conway’s Law is true. To summarize, it states that a product will tend to reflect the structure of the organization that produces it. The corollary is that if your organization has divisions which are of no practical value to the product’s consumer, you will be creating striations within your product that impact client satisfaction. Hence basic ITSM and Agile doctrines on creating teams around owning a service/product.

2. People want to form teams and stay with them. This should be obvious from basic psychology/sociology, but if you set up an organization that is too flexible it strongly degrades the morale of your workers. In my previous role we conducted frequent engineer satisfaction surveys and the most prominent truth drawn from them is the more frequently people are asked to change roles, reorg, move to different teams, the less happy they are. Even people that want to move around to new challenges frequently are happier if they are moving to those new challenges with a team they’ve had an opportunity to move through Tuckman’s stages of group development with.

I have seen enough real-world quantified proof of both of these assertions that I will treat them as assumptions going forward.

Organizational Options

We tried out all four of these models within the same organization of high performing engineers and thus had a great opportunity to compare their results.

Separate Teams

When we started, we had the traditional model of separate teams which would hand off work to each other. “Dev,” “Ops,” “QA,” “Product” were all under separate management up through several levels and operated as independent teams; individual affinities with specific products were emergent and simply matters of convenience (e.g. “Oh, he knows a lot about that BI stuff, let him handle that request”) and not a matter of being dedicated to specific product(s).

Embedded Crossfunctional Service Teams

Our first step away from the pure separate team model was to take those separate teams and embed specific members from them into service oriented teams, while still having them report to a manager or director representing their discipline. In some cases, the disciplinary teams would reserve some number of staff for tool development or other cross-cutting concerns. So QA, for example, had several engineers assigned to each product team, even though they were regarded as part of the permanent QA org primarily.

We (very loosely) considered our approach to be decentralized and microservice based; Martin Fowler is doing a good article in installments on Microservice Architecture if you want more on that topic.

Fully Integrated Service Teams

With our operations staff we went one step farther and simply permanently assigned them to product teams and removed the separate layer of management entirely. Dev and “DevOps” engineers reported to the same engineering manager and were a permanent part of a given product team. Any common tooling needed was created by a separate “platform” engineering team which was similarly integrated.

Project Based Organization

Due to the need to surge effort at times, we also had some organizations that were project, not product, based. Engineers would be pulled either from existing teams or entire teams would additionally be pulled into a short term (1, 2, 3 month) effort to try to make significant headway across multiple products, and then dissolve afterwards.

Hmm, this looks like it’s getting big (and I need to do some diagrams). I’ll break it up into separate articles for each type of org and its pros and cons, and then a conclusion.

12 Comments

Filed under Agile, DevOps

Tagged as agile, DevOps, microservices, organization, Q&A

Video

Trusted Software Alliance launches new podcast and news series

The Trusted Software Alliance News Network launched this week and is featuring 5 minute daily doses of AppSec and DevOps news. The show is run by @eusp along with weekly co-hosts @damonedwards, @cote and yours truly (@wickett). Check out the inaugural post and follow the blog at trustedsoftwarealliance.com.

Leave a comment

by wickett | January 17, 2014 · 9:56 am

by Ernest Mueller | November 25, 2013 · 8:00 am

A DevOps Thanksgiving

This last week at the Agile Austin DevOps SIG, our topic was simple – “A DevOps Thanksgiving.” We all shared what we’re thankful for from the DevOps world this year – things that have made our lives better.

It was a nice and refreshing discussion! People mentioned the things making their lives better. Group members expressed their thanks for such diverse things as DevOps Weekly, rspec-puppet, The Phoenix Project, Vagrant, Docker, test-kitchen with serverspec and bats, provisioned IOPS in AWS, DevOps Cafe, The Ship Show, increasing crossplatform support in DevOps tools and thinking, DevOps tracks springing up at conferences like Agile 2013 and AppSec, DevOpsDays… Thanks to all the people who put in lots of their hard work to make them all possible!

In retrospect we have a lot to be thankful for. Even though the techno-hipsters don’t even want to say the word “DevOps” any more, it’s a very real change bringing better things to our tools, products, and even lives. I know I’ve seen a lot of change in the teams I’ve worked with that have implemented it – fewer “all hands overnight releases,” less psychotic oncall, less inter-group hatefulness – DevOps has brought us all a lot of good things, and it’s just starting to take hold out there in the industry.

How about you? What DevOps thing were you thankful for this year? Add into the comments here, blog it up yourself, tweet it (I suggest #devopsthanksgiving as the hashtag)… Spread the thanks!

Leave a comment

Filed under DevOps

Tagged as 2013, culture, DevOps, thanksgiving

by Ernest Mueller | July 19, 2013 · 10:58 am

Scrum for Operations: How We Got Started

Welcome to the newest article in Scrum for Operations. I started this series when I was working for NI. But now I’m going through the same process at BV so time to pick it back up again! Like my previous post on Speeding Up Releases, I’m going to go light on theory and heavy on the details, good and bad, of how exactly we implemented Agile and DevOps and where we are with it.

Here at BV (Bazaarvoice), the org had adopted Agile wholesale just a couple months before I started. We also adopted DevOps shortly after I joined by embedding ops folks in the product teams. Before the Agile/DevOps implementation there was a traditional organization consisting of many dev teams and one ops team, with all the bottlenecking and siloing and stuff that’s traditional in that kind of setup. Newer teams (often made up of newly hired engineers, since we were growing quickly) that started out on the new DevOps model picked it up fine, but in at least one case we had a lot of culture change to do with an existing team.

Our primary large legacy team is called the PRR team (Product Ratings and Reviews) after the name of their product, which now does lots more than just ratings and reviews, but naturally marketing rebranding does little to change what all the engineers know an app is called. Many of the teams working on emerging greenfield products that were still in development had just one embedded ops engineer, but on our primary production software stack, we had a bunch. PRR serves content into many Internet retailer’s pages; 450 million people see our reviews and such. So for us scalability, performance, monitoring, etc. aren’t a sideline, they’re at least half of the work of an engineering team!

This had previously been cast as “a separate PRR operations team.” The devs were used to tossing things over the wall to ops and punting on the responsibility even if it was their product, and the ops were used to a mix of firefighting and doing whatever they wanted when not doing manual work the devs probably should have automated.

I started at BV as Release Manager, but after we got our releases in hand, I was asked to move over to lead the PRR team and take all these guys and achieve a couple major goals, so I dug in.

Moving Ops to Agile

I actually started implementing Agile with the PRR Ops team because I managed just them for a couple months before being given ownership of the whole department. I had worked closely with many of them before in my release manager role so I knew how they did things already. The Ops team consisted of 15 engineers, 2/3 of which were in Ukraine, outsourced contractors from our partner Softserve.

At start, there was no real team process. There were tickets in JIRA and some bigger things that were lightly project managed. There was frustration with Austin management about the outsourcers’ performance, but I saw that there was not a lot of communication between the two parts of the team. “A lot of what’s going bad is our fault and we can fix it,” I told my boss.

Standups

A the first process improvement step, I introduced daily standups (in Sep 2012). These were made more complicated by the fact that we have half of our large team in Ukraine; as a result we used Webex to conduct them. “Let’s do one Austin standup and one Ukraine standup” was suggested – but I vetoed that since many of the key problems we were facing were specifically because of poor communication between Austin and Ukraine. After the initial adjustment period, everyone saw the value of the visibility it was giving us and it became self-sustaining. (With any of these changes, you just have to explain the value and then make them do it a little while “as a pilot” to get it rolling. Once they see how it works in practice and realize the value it’s bringing, the team internalizes it and you’re ready for the next step.) Also because of the large size and international distribution I did the “no-no” of writing up the standup and sending the notes out in email. It wasn’t really that hard, here’s an example standup email from early on:

Subject: PRR Infrastructure Daily Standup Notes 11/05/2012

Individual Standups
(what you did since last standup, what you will do by the next standup, blockers if any)

Alexander C – did: AVI-93 dev deploy testing of c2, release activity training; will do: finish dev c2, start other clusters
Anton P – did: review AVI-271 sharded solr in AWS proxy, AVI-282 migrating AWS to solr sharding; will do: finish and test
Bryan D – did: Hosted SEO 2.0 discussion may require Akamai SSL, Tim’s puppet/vserver training, DOS-2149 BA upgrade problems, document surgical deploy safety, HOST-71 lab2 ssh timeout, AVI-790, 793 lab monitoring, nexus errors; will do: finish prep Magpie retro, PRR sprint planning, Akamai tickets for hosted SEO, backlog task creation.
Larry B – did: MONO-107,109 7.3 release branch cut, release training; will do: AVI-311 dereg in DNS (maybe monitoring too?)
Lev P – did: deploy script change testing AVI-771; will do: more of that
Oleg K – did: review AVI-676 changes, investigate deployment runbooks/scripts for solr sharding AVI-773; to do: testing that, AVI-774 new solr slaves
Oleksandr M – did: out Friday after taking solr sharding live; will do: prod cleanup AVI-768, search_engine.xml AVI-594
Oleksandr Y – did: AVI-789 BF monitoring, had to fix PDX2 zabbix; will do: finish it and move to AVI-585 visualization
Robby M – did: testing AVI-676 and communicating about AWS sharding; will do: work with Alex and do AVI-698 c7 db patches for solr sharding
Sergii V – did: AVI-703 histograms, AVI-763 combining graphs; will do: continue that and close AVI-781 metrics deck
Serhiy S – did: tested aws solr puppet config AVI-271, CMOD stuff AVI-798, AVI-234
Taras U – did: tested BVC-126599 data deletion. Will do: pick up more tickets for testing
Taras Y – did: AVI-776 black Friday scale up plan, AVI-762 testing BF scale up; will do: more scale up testing
Vasyl B – did: MONO-94 GTM automation to test; will do: AVI-770 ftp/zabbix thing
Artur P – did: AVI-234 remove altstg environment, AVI-86 zabbix monitoring of db performance “mikumi”; will do: more on those

For context, while this was going on we were planning for Black Friday (BF) and executing on a large project to shard our Solr indexes for scaling purposes. The standup itself brought loads of visibility to both sides of the team and having the emails brought a lot of visibility to managers and stakeholders too. It also helped us manage what all the outsourcers were doing (I’ll be honest, before that sometimes we didn’t know what a given guy in Ukraine was doing that week – we’d get reports in code later on, but…).

I took the notes in the standup straight into an email and it didn’t really slow us down (I cheated by having the JIRA project up so I could copy over the ticket numbers). Because of the number of people, the Webex, and the language barrier the standups took 30 minutes. Not the fastest, but not bad.

Backlog

After everyone got used to the standups, I introduced a backlog (maybe 2 weeks after we started the standups). We had JIRA tickets with priorities before, but I added a Greenhopper Scrum style backlog. Everyone got the value of that immediately, since “we have 200 P2 tickets!” is obviously Orwellian at best. When stakeholders (my boss, other folks) had opinions on priorities we were able to bring up the stack-ranked backlog and have a very clear discussion about what it was more or less urgent/important than. (Yes, there were a couple yelling matches about “it’s meaningless to have five ‘top priorities!'” before we had this.) Interrupt tickets would just come in at the top.

Here’s a clip of our backlog just to get the gist of it…

All the usual work… just in a list. “Work it from the top!” We still had people cherry-picking things from farther down because “I usually work on builds” or “I usually work on metrics” but I evangelized not doing that.

Swimlanes

Using this format also gave me insight into who was doing what via the swimlanes view in JIRA. When we’d do the standup we started going down in swimlane order and I could ask “why I don’t see that ticket” or see other warning signs like lots of work in progress. An example swimlane:

This helped engineers focus on what they were supposed to be doing, and encouraged them to put interrupts into the queue instead of thrashing around.

Sprints

Once we had the backlog, it was time to start sprinting! We had our first sprint planning meeting in October and I explained the process. They actually wanted to start with one week sprints, which was interesting – in the dev world often times you start in with really long (4-6 week) sprints and try to get them shorter as you get more mature. In the ops world, since things are faster paced, we actually started at a week and then lengthened it out later once we got used to it.

The main issue that troubled people was the conjunction of “interrupt” tickets with proactive implementation tickets. This kind of work is why lots of people like to say “Ops should use kanban.”

However, I learned two things doing all this. The first is that for our team at least, the lion’s share of the work was proactive, not reactive, especially if you use a 1-2 week lookahead. “Do they really need that NOW, or just by next sprint?” Work that used to look interrupt driven under a “chaos plus big projects” process started to look plannable. That helped us control the thrash of “here’s a new urgent request” and resist it breaking the current sprint where possible.

Also, the amount of interrupt work varies from day to day but not significantly for a large team over a 1-2 week period. This means that after a couple sprints, people could reliably predict how many points of stories they could pull because they knew how much time got pulled to interrupt work on average. This was the biggest fear of the team in doing sprint planning – that interrupt work would make it impossible to plan – and there was no way to bust through it except for me to insist that we do a couple sprints and reevaluate. Once we’d done some, and people learned to estimate, they got comfortable with it and we’ve been scrumming away since.

And the third thing – kanban is harder to do correctly than Scrum. Scrum enforces some structure. I’ve seen a lot of teams that “use kanban” and by that they mean “they do whatever comes to mind, in a completely uncontrolled manner,” indistinguishable from how Ops used to do things. Real kanban is subtle and powerful, and requires a good bit of high level understanding to do correctly. Having a structure helped teach my team how to be agile – they may be ready for kanban in another 6 months or so, perhaps, but right now some guard rails while they learn a lot of other best practices are serving us well.

Poker Planning

After the traditional explanation (several times) about what story points are, people started to get it. We used planningpoker.com for the actual voting – it’s a bit buggy but free, and since sprint planning was also 15 people on both (or more) sides of a Webex, it was invaluable.

Velocity

It’s hard to argue with success. We watched the team velocity, and it basically doubled sprint to sprint for the first 4 sprints; by the end of November we were hitting 150 story points per sprint. I wish I had a screen cap of the velocity from those original sprints; Greenhopper is a little cussed and refuses to show you more than 7 sprints back, but it was impressive and everyone got to see how much more work they were completing (as opposed to ‘doing’). I do have one interesting one though:

This is our 6th and following sprints; you see how our average velocity was still increasing (a bit spikily) but in that last sprint we finally got to where we weren’t overpromising and underdelivering, which was an important milestone I congratulated the team on. Nowadays their committed/completed numbers are always very close, which is a sign of maturity.

Just Add Devs – False Start!

After the holiday rush, they asked me and another manager, Kelly, to take over the dev side of PRR as well, so we had the whole ball of wax (doubling the number of people I was managing). We tried to move them straight to full Scrum and also DevOps the team up using the embedded ops engineer model we were using on the other 2.0 teams. PRR is big enough there were enough people for four subteams, so we divided up the group into four sprint teams, assigned a couple ops engineers to each one, and said “Go!”.

This went over pretty much like a lead balloon. It was too much change too fast. Most of the developers were not used to Agile, and trying to mentor four teams at once was difficult. Combined with that was the fact that most of the ops staff was remote in Ukraine, what happened was each Austin-BV-employee-led team didn’t really consider “those ops guys” part of their team (I look around from my desk and see four other devs but don’t see the ops people… Therefore they’re not on my team.) And that ops team was used to working as one team and they didn’t really segment themselves along those lines meaningfully either. Since they were mostly remote, it was hard to break that habit. We tried to manage that for a little while, but finally we had to step back and try again.

Check back soon for Scrum for Operations: Just Add DevOps, where I reveal how we got Agile and DevOps to work for us after all!

17 Comments

Filed under Agile, DevOps

Tagged as agile, DevOps, Operations, scrum, scrum4ops, system administration, systems, web, webops

by Ernest Mueller | July 17, 2013 · 1:27 pm

Scrum for Operations: Fitting In As An Ops Engineer

So far in this series, I’ve introduced the basics of Scrum as it generally is used and explained the practices that make it extremely successful. But that’s for developers, right? If you are in operations, what does this mean to you? How do you fit in? For an ops person, the major challenges are mental – you have to reorient your way of thinking, and then things drop into place very well.

I’m writing from the perspective of a Web operations guy, though I’ve done more traditional sysadmin work and managed infrastructure (and dev) teams over time (and started off as a dev, many years ago). Some of my terminology is oriented towards creating a product and keeping a Web site up, but you should be able to conceptually substitute your own kind of system, just as all different kinds of developers, not just Web developers, use and benefit from agile.

The Team

First, “DevOps.” Get an Ops person assigned to the dev team. This is fundamental – if it’s an externalized relationship, where the dev team is making requests of your “Infrastructure org”, you will not be seen as part of the team and your effectiveness will be extremely diminished. You need to be more or less dedicated to this project, not handling it from some shared work queue. This reinforces the fundamental values of Agile. You join the team, and you dedicate yourself to the overall success of the product you are working on. It is this integration, and the trust that arises from shared goals, that will remove a lot of the traditional roadblocks you are used to facing when dealing with a dev team. A real agile team should have similarly embedded product, QA, and UX folks, it’s not a new idea.

You are not “a UNIX guy” or “A DBA” any more. You are “a member of the Ratings and Reviews team,” and you happen to have a technical specialty. This may seem like sophistry but it’s actually one of the most critical parts of this cultural transformation.

The Backlog

Start thinking of tasks in a customer-feature-facing kind of way for the backlog. For example, no one but you wants to hear about “configuring the SAN,” they want to know that at the end of the sprint “customers will be able to save files to persistent storage.” If what you’re doing doesn’t have any benefit to the end customer – why are you doing it again? You shouldn’t be.

Figure out how to state operational concerns like performance, maintainability, and availability as benefits in the backlog. Some infrastructure stuff belongs in the backlog, other parts of it belong more in standards (e.g. the team Definition of Done now states you have to have monitoring on a new service…). The product manager and dev team aren’t dumb, they will understand that performance, availability, security, ability to release their software, etc. are important goals that have merit in the backlog. The typical story-lingo is “As an X, I want Y so I can do Z.” “As a client, I want my data backed up so that in the case of a disaster, I am minimally affected.” “As an engineer, I want the uptime state of my services monitored so I can ensure customers are being served.”

You will be challenged (and this is good) on items that are “monkey work.” “I need to go delete log files off that server, so it doesn’t crash.” Hey, why are we doing that? Why is it manual? Should we have a story for proper log rotation? Need a developer to help? You will see a virtuous cycle develop to “fix things right.” Most of the devs haven’t seen a lot of the demeaning stuff you’re asked to do, and they’ll try to help fix it.

The Sprint

I’ll be honest, the first time I was confronted with the prospect of breaking up systems work into sprints I thought it was very unlikely it could be done. “Things are either short interrupts or long projects, right, that doesn’t make any sense.” And then I did it, and the scales dropped from my eyes. Remember refactoring. Developers doing agile are used to refactoring, while we are used to only having “one bite at the apple” – if we don’t get the systems all 100% right before we unleash the developers on them, then we won’t be able to change them later right? Wrong!

In a certain sense, sprint planning is a big load off from traditional planning. Infrastructure folks are used to being asked to provide a granular task breakdown and timeline of 6 months worth of work for some big-bang implementation. Then when reality causes the plan to deviate from that, everyone freaks. Agile takes horizon planning and institutionalizes it – you only need to be able to specifically plan your next 2 (or so) weeks, and if you can’t do that you need to try harder. What can you implement in 2 weeks that has some kind of value? Get a Tomcat running sprint 1, then tune it sprint 2, then monitor it sprint 3 – don’t bundle everything up into one huge mass.

Testing

Figure out what unit tests mean to you for things you are implementing. “Nothing” is the wrong answer. If you’re making a network change, for example, there is something you can do to test that short of “waiting for people to complain.” If you are installing tomcat on a server – if you’re using a framework like chef or puppet they’ll have testing options built in, but even if not there’s certain things you can do to ensure its functionality instead of passing it on and causing lost time and rework when someone else finds out it’s not working right.

More to come, meditate upon those truths for a bit – ask questions in the comments!

2 Comments

Filed under DevOps

Tagged as agile, DevOps, Operations, scrum, scrum4ops, system administration, systems, web, webops

by Ernest Mueller | June 24, 2013 · 2:51 pm

Crosspost: How Bazaarvoice Weathered The AWS Storm

For regular agile admin readers, I wanted to point out the post I did on the Bazaarvoice engineering blog, How Bazaarvoice Weathered The AWS Storm, on how we have designed for resiliency to the point where we had zero end user facing downtime during last year’s AWS meltdown and Leapocalypse. It’s a bit late, I wrote it like in July and then the BV engineering blog kinda fell dormant (guy who ran it left, etc.) and we’re just getting it reinvigorated. Anyway, go read the article and also watch that blog for more good stuff to come!

Leave a comment

Filed under Cloud, DevOps

Tagged as availability zone, aws, bazaarvoice, Cloud, DevOps, ec2, leapocalypse, outage, region, resiliency

by Ernest Mueller | June 22, 2013 · 12:17 pm

DevOpsDays Silicon Valley 2013 Day 2 Liveblog

Woooo! Last day of a week of conferencing. DevOpsDays Day 1 was good and I have even more openspace topics I plan to propose next time. As usual this is being livestreamed and will be viewable later as well at bmc.com/devops.

Sponsor Watch… Got to talk to our friends at PagerDuty (alert management) and Datadog (monitoring/dashboarding), we use them and love them. And I got to see Stormpath again, they first showed up at last DevOpsDays with a SaaS hosted auth solution (not like PingIdentity and Okta, they actually store the usernames/passwords for you, Les Hazlewood the Apache Shiro guy started it) and they’re growing quickly. Also talked to SaltStack which does salt, a remote command execution framework. 10gen was here with a MongoDB SaaS backup solution (nice!) and monitoring solution.

Leading the Horses to Drink

By Damon Edwards (@damonedwards) from DTO and now #SimplifyOps.

How to spread DevOps in enterprises. There’s silos you know. The term DevOps may work against you – it’s evangelical and being overused/washed already.

There is no ‘why’ other than the why of the business. Read your Deming/Collins/Four Steps to the Epiphany/etc.

Go ask people… Something.

Develop a common DevOps vision. Not a process because they’ll get blinders on. [Ed: I believe this is a false dichotomy – you should teach both. Vision without process lacks focus and process without vision lacks direction. It’s like accuracy and precision.]

See the system
Focus on flow
Recognize feedback loops

Do a value stream mapping – read Learning to See. OK, this is the meat of the preso – very hard to read though.

Take your information flow and turn it into an artifact flow

Do a timeline analysis, find waste

Metrics. Establish the metric chain of what matters to the business, driven down to a capability which influences what matters to the business, and driven down to an activity over which an infividual can cause/influence outcomes.

Doesn’t require saying “devops.”

Teach concepts
Analysis
Metrics chains
Do something
Iterate

Only takes like 3 days to bootcamp it. Then put in continuous improvement loops.

You can only break silos by brute-force being the boss, but misalignment will reassert itself. Have to change the alignment.

Q&A: Do it with everyone in the same room with whiteboards/postits, it works better than getting fancy

Beyond the Pretty Charts

Toufic Boubez from Metafor. Cofounded Layer 7 and escaped when CA acquired them.

Came from a popular DOD Austin Openspace – see the blog post!

We’ve moved beyond static thresholds – or, at least, everyone thinks they suck. Need more dynamic analytics.
Context is important – planned and known (or should be known) events cause deviation. Correlate events with metric gathering.
Don’t just look at timelines. Check the thinking round Etsy’s Kale and Skyline, many eval methods assume normal metric distribution and that’s uncommon. Look at a histogram of any given data – like latency is usually gamma not gaussian.
Is all data important to collect? There’s argument over that. Get it all and analyze vs figure out what’s important to not waste time.
We all want to automate. Need detection before it’s critical. Can’t always have a human in the loop. Whipping out the control theory – open loop control systems, closed loop – to get self healing systems we need current state/desired state diffing from our monitoring systems and taking action. [Ed. We experimented with this back at NI, we had Sitescope going to a homegrown system called “monolith” that would take actions. Hard to account for all factors though and eventually was discontinued.] Also supervised vs unsupervised loops [Ed: – we might have kept monolith around if it SMSed us and said “memory is high on this server I believe I should restart the java process, is that OK” and we could PagerDuty-like say yea or nay.]

How much data do you need? No more res that twice your highest frequency (Nyquist-Shanon). Most algorithms will smooth/average/etc.

Q&A: Are control systems more appropriate for small not large systems? No – just like in industry, as long as you design for that then it’s not just for toys.

And now I step in for the vendor pitch for Riverbed. Agile Admin Peco left yesterday and the other Riverbed booth guys made themselves scarce, so I did their shout-out for them. They have Zeus EC2 LBs, Aptimize web front end optimizer, and Opnet Appinternals Xpert APM tool! Very cool.

Identifying Waste in your Build Pipeline

Scott Turnquest from Thoughtworks

Tools: Value stream mappings, fishbone analysis, “5 Whys”

So how do we do that value stream mapping? Here we go! [Ed: Oh, this is nice, I was sad that in the DTO presentation they mentioned them and threw some up but didn’t really dive down into one.]

A day of analysis of one small feature – a day of wait, 4 days of dev, 2 mins of wait, 1 hour of acceptance tests, 4 hours of deploy, 1 day in staging, 4 hours to deploy to prod. Note the waste areas – “4 days in dev? Really?” and the long ass deploy windows [Ed: Our value stream looks depressingly like this.] Process cycle efficiency of 75% (value creation time/total time)

So to determine the source of those waste areas, use the fishbone diagram. Had long feedback cycles from structure of code and build/deploy pipelines. Couldn’t test w/o AWS and can’t test individual components, provisioning was serial and repos were flaky.

Fix underlying cause (most impact first) – deploy pipelines. Reduce failure rate of deployments. Half were failing, and failing slow. Moved to AMI baking for reliability. [Ed: They said I was crazy a couple years ago when I said this, “no it’s a foil ball…” Bake when you can!] So this got them from 4 hours to 2 hours, and then parallelized and got down to 25 minutes. This cut down the staging and prod deploys but also the dev time. Process cycle efficiency up to 83%.

5 Whys root cause analysis method. Figured out manual hard to automate deployments were at the root, automated them – don’t be afraid to restructure/redesign when complexity gets in the ways.

Analysis techniques are not just for analysts!

Read Jez Humble’s “Continuous Delivery”, Poppendieck’s “Lean Soft Dev”/”Implementing Lean Soft Dev” , Derby/Larsen “Agile Retrospectives”

Clusters, developers, and the complexity in Infrastructure Automation

Antoni Batchelli of PalletOps. Complexity, essential and accidental. Building a system is simple but the systems are complex at runtime, and “complexity of a system is the degree of difficulty in predicting the properties of the system given the properties of the system’s parts.”

In DevOps we see infrastructure-aware software and concepts moving up into dev processes.

Devs want to run “their own” cluster with all the setups they need – productionlike, but with specific versions/timings/data/code/etc. Don’t care about infra details but want consistent envs/code.

Software has to be infrastructure aware now to autoscale, self-heal, etc. The app is the best informed actor to make/orchestrate infra decisions.

[Ed: This late into a conference week, I get a little irritated about presentations that are not really clear *why* they are telling you what they’re telling you.]

He hates incidental complexity. Me too.

OK, maybe we’re getting to a thesis. Let people solve problems where they are less complex: at the right level of abstraction. Build layers of abstraction – infrastructure, OS, services, actions. Make them into modules, make them functional and polymorphic.

Ignites!

James Wickett (@wickett) on Rugged DevOps and gauntlt for security + DevOps. gauntlt is a gem for continuous security testing as part of your build cycle. BDD your app’s security! Knock Out!!! go to gauntlt.org to get started.

Karthik Gaekwad (@iteration1) on DevOps Culture in the CIA. Devops is culture/automation/measurement/sharing. Seen Zero Dark Thirty? Well, the true story behind that details the COA’s transformation from a split between analysts and operatives especially using Sisterhood, a group of female analysis tracking Bin Laden since 1980. Post 9/11 there was a mass reorg to become more tactical – analysts became Targeters and worked with Operatives hand in hand. Same kind of silo busting. The Phoenix Project is Zero Dark Thirty for DevOps!

Dave Mangot (@davemengot) for DevOps Do’s and Don’ts from Salesforce. Do give everyone the tools they need to do their jobs. Don’t make ops the constraint, Do lots of communicating. Don’t forget to include everyone. Do get ops involved early. Don’t create a front door (loaded) process. Do have integration environments, Don’t forget config management. Do have blameless post-mortems. Don’t use the Phoenix Project as a bludgeon. Do use Agile as a cultural tool. Don’t rely on tools to change culture. Do get executive sponsorship. Don’t do shadow IT. Do use Damon Edward’s levers. Don’t just lecture, it’s a participation sport. Do structure the org around delivery. Don’t make separate DevOps teams or jackets. Do get the whole company involved, DevOps is for everyone.

Jonathan Thorpe – Preventing DevOps success. Not planning for scale. Not having unit tests. Not designing automated tests to scale. Not managing your capacity. Not using your resources effectively. Not using same deployment process for all environments. Not knowing what/where/when/who (activity tracking). Getting covered in ants.

DevOps is the future – John Esser from ancestry.com. What keeps CIOs up at night? Besides ants? IT. Need time to value. Transform mindset/processes/tools/etc. Strangler pattern.

DevOps productivity survey by Oliver White from ZeroTurnaround. DevOps oriented teams spend more time on infrastructure improvements and less on firefighting and support. Problem recoveries are shorter. Release software faster. use more custom tools. Make love for longer time. @rebel_labs

Nathan Harvey on leveling up your skills. Quit! Go to a conference. Try new things. Do a project somewhere. Always be interviewing.

Leave a comment

Filed under Conferences, DevOps

Tagged as build, DevOps, devopsdays, metrics

by Ernest Mueller | June 20, 2013 · 7:09 pm

Velocity 2013 Day 3 Liveblog: Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Srinivas Peri, Adobe and Alex Honor, SimplifyOPS/DTO

Adobe needed to move from desktop, packaged software to a cloud services model and needed a DevOps transformation as well.

Srini’s CoreTech Tools/Infrastructure group tries to transform wasted time to value time (enabling tools).

So they started talking SaaS and Srini went around talking to them about tooling.

Dan Neff came to Adobe from Facebook as operations guru from Facebook. He said “let’s stop talking about tools.” He showed him the 10+ deploys a day at Flickr preso. Time to go to Velocity! And he met Alex and Damon of DTO and learned about loosely coupled toolchains.

They generated CDOT, a service delivery platform. Some teams started using it, then they bought Typekit and Paul Hammond thought it was just lovely.

And now all Adobe software is coming through the cloud. They are not the CoreTech Solution Engineering team – who makes enabling services.

Do something next week! And don’t reinvent the wheel.

How To Do It

First problem to solve. There are islands of tools – CM, package, build, orchestration, package repos, source repos. Different teams, different philosophies.

And actually, probably in each business unit, you have another instantiation of all of the above.

CDOT – their service delivery platform, the 30k foot view

Many different app architectures and many data center providers (cloud and trad). CDOT bridges the gap.

CDOT has a UI and API service atop an integration layer It uses jenkins, rundeck, chef, zabbix, splunk under the covers.

On the code side – what is that? App code, app config, and verification code. But also operations code! It is part of YOUR product. It’s an input to CDOT.

So build (CI). Takes from perforce/github to pk/jenkins, into moddav/nexus, for cloud stuff bake to an AMI, promote packages to S3 and AMIs to an AMI repo.

For deploy (CD), jenkins calls rundeck and chef server. Rundeck instantiates the cloudformation or whatever and does high level orchestration, the AMis pull chef recipes and packages from S3, and chef does the local orchestration. Is it pull or push? Both/either. You can bake and you can fry.

So feature branches – some people don’t need to CD to prod, but they sure do to somewhere. So devs can mess with feature branches on dev boxes, but then all master checkins CD to a CD environment. You can choose how often to go to prod.

Have a cool “devops workbench” UI with the deployment pipeline and state. So everyone has one-click self service deployment with no manual steps, with high confidence.

Now, CDOT video! It’s not really for us, it’s their internal marketing video to get teams to uptake CDOT. Getting people on board is most of the effort!

What’s the value prop?

Save people time
Alleviate their headaches
Understand their motivations (for when they play politics)
Listen to and address their fears

Bring testimonials, data, presentations, do events, videos! Sell it!

“Get out of your cube and go talk to people”

Think like a salesperson. Get users (devs/PMs) on board, then the buyers (managers/budget folks), partners and suppliers (other ops guys).

Leave a comment

Filed under Conferences, DevOps

Tagged as DevOps, SaaS, transformation, velocity, velocityconf, velocityconf13

Tag Archives: DevOps

DevOpsDays Austin Is Coming!

Agile Organization Incorporating Various Disciplines

Two Fundamental Discoveries

Organizational Options

Separate Teams

Embedded Crossfunctional Service Teams

Fully Integrated Service Teams

Project Based Organization

Trusted Software Alliance launches new podcast and news series

A DevOps Thanksgiving

Scrum for Operations: How We Got Started

Moving Ops to Agile

Standups

Backlog

Swimlanes

Sprints

Poker Planning

Velocity

Just Add Devs – False Start!

Scrum for Operations: Fitting In As An Ops Engineer

DevOpsDays Silicon Valley 2013 Day 2 Liveblog

Leading the Horses to Drink

Beyond the Pretty Charts

Identifying Waste in your Build Pipeline

Clusters, developers, and the complexity in Infrastructure Automation

Ignites!

Velocity 2013 Day 3 Liveblog: Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

How To Do It

Subscribe

Recent Comments

Recent Posts

Austinites

Cloud

DevOps

Archives