Monthly Archives: June 2013

Notes and Tweets from DevOps Days Silicon Valley

Over the last few months I have been using TweetScriber (an iPad app) to take notes at conferences. The really nice part about it is that it is a note taking application that allows you to live-tweet and record other people’s tweets all in one place. At DevOps Days Silicon Valley 2013, I tried to use TweetScriber to record what happened and capture what others were saying on twitter as well.

Here are my raw notes from DevOps Days Silicon Valley Day 1 and DevOps Days Silicon Valley Day 2. I also ran an open space on doing security testing with gauntlt and recorded those notes as well.

The Agile Admin team is working on putting together a summary of DevOps Days and Velocity Conference, but until that is released the raw notes will have to suffice.

Leave a comment

Filed under DevOps

Crosspost: How Bazaarvoice Weathered The AWS Storm

For regular agile admin readers, I wanted to point out the post I did on the Bazaarvoice engineering blog, How Bazaarvoice Weathered The AWS Storm, on how we have designed for resiliency to the point where we had zero end user facing downtime during last year’s AWS meltdown and Leapocalypse. It’s a bit late, I wrote it like in July and then the BV engineering blog kinda fell dormant (guy who ran it left, etc.) and we’re just getting it reinvigorated.  Anyway, go read the article and also watch that blog for more good stuff to come!

Leave a comment

Filed under Cloud, DevOps

Velocity 2013 Wrapup

Whew, we’re all finally back home from the conferencing. Fun was had by all.

@iteration1, @ernestmueller, @wickett

@iteration1, @ernestmueller, @wickett

Over the next week I’ll go back to the liveblog articles and put in links to slides/videos where I can find them (feel free and post ones you know in comments on the appropriate post!). We’ll also try to sum up the best takeaways into a Velocity 2013 and DevOpsDays Silicon Valley 2013 quick guide, for those without the patience to read the extended dance remix.

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Silicon Valley 2013 Day 2 Liveblog

Woooo!  Last day of a week of conferencing.  DevOpsDays Day 1 was good and I have even more openspace topics I plan to propose next time.  As usual this is being livestreamed and will be viewable later as well at bmc.com/devops.

Sponsor Watch… Got to talk to our friends at PagerDuty (alert management) and Datadog (monitoring/dashboarding), we use them and love them. And I got to see Stormpath again, they first showed up at last DevOpsDays with a SaaS hosted auth solution (not like PingIdentity and Okta, they actually store the usernames/passwords for you, Les Hazlewood the Apache Shiro guy started it) and they’re growing quickly. Also talked to SaltStack which does salt, a remote command execution framework. 10gen was here with a MongoDB SaaS backup solution (nice!) and monitoring solution.

Leading the Horses to Drink

By Damon Edwards (@damonedwards) from DTO and now #SimplifyOps.

How to spread DevOps in enterprises.  There’s silos you know.  The term DevOps may work against you – it’s evangelical and being overused/washed already.

There is no ‘why’ other than the why of the business. Read your Deming/Collins/Four Steps to the Epiphany/etc.

Go ask people… Something.

Develop a common DevOps vision. Not a process because they’ll get blinders on. [Ed: I believe this is a false dichotomy – you should teach both. Vision without process lacks focus and process without vision lacks direction.  It’s like accuracy and precision.]

  1. See the system
  2. Focus on flow
  3. Recognize feedback loops

Do a value stream mapping – read Learning to See.  OK, this is the meat of the preso – very hard to read though.

Take your information flow and turn it into an artifact flow

Do a timeline analysis, find waste

Metrics.  Establish the metric chain of what matters to the business, driven down to a capability which influences what matters to the business, and driven down to an activity over which an infividual can cause/influence outcomes.

Doesn’t require saying “devops.”

  1. Teach concepts
  2. Analysis
  3. Metrics chains
  4. Do something
  5. Iterate

Only takes like 3 days to bootcamp it. Then put in continuous improvement loops.

You can only break silos by brute-force being the boss, but misalignment will reassert itself. Have to change the alignment.

Q&A: Do it with everyone in the same room with whiteboards/postits, it works better than getting fancy

Beyond the Pretty Charts

Toufic Boubez from Metafor.  Cofounded Layer 7 and escaped when CA acquired them.

Came from a popular DOD Austin Openspace – see the blog post!

  1. We’ve moved beyond static thresholds – or, at least, everyone thinks they suck. Need more dynamic analytics.
  2. Context is important – planned and known (or should be known) events cause deviation. Correlate events with metric gathering.
  3. Don’t just look at timelines. Check the thinking round Etsy’s Kale and Skyline, many eval methods assume normal metric distribution and that’s uncommon. Look at a histogram of any given data – like latency is usually gamma not gaussian.
  4. Is all data important to collect? There’s argument over that.  Get it all and analyze vs figure out what’s important to not waste time.
  5. We all want to automate. Need detection before it’s critical. Can’t always have a human in the loop. Whipping out the control theory – open loop control systems, closed loop – to get self healing systems we need current state/desired state diffing from our monitoring systems and taking action. [Ed. We experimented with this back at NI, we had Sitescope going to a homegrown system called “monolith” that would take actions. Hard to account for all factors though and eventually was discontinued.] Also supervised vs unsupervised loops [Ed: – we might have kept monolith around if it SMSed us and said “memory is high on this server I believe I should restart the java process, is that OK” and we could PagerDuty-like say yea or nay.]

How much data do you need?  No more res that twice your highest frequency (Nyquist-Shanon). Most algorithms will smooth/average/etc.

Q&A: Are control systems more appropriate for small not large systems?  No – just like in industry, as long as you design for that then it’s not just for toys.

And now I step in for the vendor pitch for Riverbed.  Agile Admin Peco left yesterday and the other Riverbed booth guys made themselves scarce, so I did their shout-out for them. They have Zeus EC2 LBs, Aptimize web front end optimizer, and Opnet Appinternals Xpert APM tool!  Very cool.

Identifying Waste in your Build Pipeline

Scott Turnquest from Thoughtworks

Tools: Value stream mappings, fishbone analysis, “5 Whys”

So how do we do that value stream mapping? Here we go!  [Ed: Oh, this is nice, I was sad that in the DTO presentation they mentioned them and threw some up but didn’t really dive down into one.]

A day of analysis of one small feature –  a day of wait, 4 days of dev, 2 mins of wait, 1 hour of acceptance tests, 4 hours of deploy, 1 day in staging, 4 hours to deploy to prod. Note the waste areas – “4 days in dev?  Really?” and the long ass deploy windows [Ed: Our value stream looks depressingly like this.] Process cycle efficiency of 75% (value creation time/total time)

So to determine the source of those waste areas, use the fishbone diagram. Had long feedback cycles from structure of code and build/deploy pipelines. Couldn’t test w/o AWS and can’t test individual components, provisioning was serial and repos were flaky.

Fix underlying cause (most impact first) – deploy pipelines. Reduce failure rate of deployments. Half were failing, and failing slow. Moved to AMI baking for reliability. [Ed: They said I was crazy a couple years ago when I said this, “no it’s a foil ball…” Bake when you can!] So this got them from 4 hours to 2 hours, and then parallelized and got down to 25 minutes. This cut down the staging and prod deploys but also the dev time. Process cycle efficiency up to 83%.

5 Whys root cause analysis method. Figured out manual hard to automate deployments were at the root, automated them – don’t be afraid to restructure/redesign when complexity gets in the ways.

Analysis techniques are not just for analysts!

Read Jez Humble’s “Continuous Delivery”, Poppendieck’s “Lean Soft Dev”/”Implementing Lean Soft Dev” , Derby/Larsen “Agile Retrospectives”

Clusters, developers, and the complexity in Infrastructure Automation

Antoni Batchelli of PalletOps. Complexity, essential and accidental. Building a system is simple but the systems are complex at runtime, and “complexity of a system is the degree of difficulty in predicting the properties of the system given the properties of the system’s parts.”

In DevOps we see infrastructure-aware software and concepts moving up into dev processes.

Devs want to run “their own” cluster with all the setups they need – productionlike, but with specific versions/timings/data/code/etc. Don’t care about infra details but want consistent envs/code.

Software has to be infrastructure aware now to autoscale, self-heal, etc. The app is the best informed actor to make/orchestrate infra decisions.

[Ed: This late into a conference week, I get a little irritated about presentations that are not really clear *why* they are telling you what they’re telling you.]

He hates incidental complexity. Me too.

OK, maybe we’re getting to a thesis. Let people solve problems where they are less complex: at the right level of abstraction. Build layers of abstraction – infrastructure, OS, services, actions. Make them into modules, make them functional and polymorphic.

Ignites!

James Wickett (@wickett) on Rugged DevOps and gauntlt for security + DevOps. gauntlt is a gem for continuous security testing as part of your build cycle. BDD your app’s security! Knock Out!!! go to gauntlt.org to get started.

Karthik Gaekwad (@iteration1) on DevOps Culture in the CIA. Devops is culture/automation/measurement/sharing. Seen Zero Dark Thirty? Well, the true story behind that details the COA’s transformation from a split between analysts and operatives especially using Sisterhood, a group of female analysis tracking Bin Laden since 1980. Post 9/11 there was a mass reorg to become more tactical – analysts became Targeters and worked with Operatives hand in hand. Same kind of silo busting. The Phoenix Project is Zero Dark Thirty for DevOps!

Dave Mangot (@davemengot) for DevOps Do’s and Don’ts from Salesforce.  Do give everyone the tools they need to do their jobs. Don’t make ops the constraint, Do lots of communicating. Don’t forget to include everyone. Do get ops involved early. Don’t create a front door (loaded) process. Do have integration environments, Don’t forget config management. Do have blameless post-mortems. Don’t use the Phoenix Project as a bludgeon. Do use Agile as a cultural tool. Don’t rely on tools to change culture. Do get executive sponsorship. Don’t do shadow IT. Do use Damon Edward’s levers. Don’t just lecture, it’s a participation sport. Do structure the org around delivery. Don’t make separate DevOps teams or jackets. Do get the whole company involved, DevOps is for everyone.

Jonathan Thorpe – Preventing DevOps success. Not planning for scale. Not having unit tests. Not designing automated tests to scale. Not managing your capacity. Not using your resources effectively. Not using same deployment process for all environments. Not knowing what/where/when/who (activity tracking). Getting covered in ants.

DevOps is the future – John Esser from ancestry.com. What keeps CIOs up at night? Besides ants? IT. Need time to value. Transform mindset/processes/tools/etc. Strangler pattern.

DevOps productivity survey by Oliver White from ZeroTurnaround. DevOps oriented teams spend more time on infrastructure improvements and less on firefighting and support. Problem recoveries are shorter. Release software faster. use more custom tools. Make love for longer time. @rebel_labs

Nathan Harvey on leveling up your skills. Quit!  Go to a conference. Try new things. Do a project somewhere. Always be interviewing.

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Silicon Valley Day 1 Presentations

All right, the corporatey part of the week (Velocity) is over, and the tech Illuminati have stayed for DevOpsDays Silicon Valley (used to be Mountain View) – with like 500 people!

The hashtag is #devopsdays and all the presentations was live streamed at the usual place for DevOpsDays live streaming, www.bmc.com/devops.  The videos are now all up on Vimeo.

To open, a funny Point Break DevOps parody!  Ah, makes me want to watch that movie again.

DevOps + Agile = Business Transformation

The first talk is from Jesse Robbins (@jesserobbins)!  Ex-Velocity co-host and co-founder of Opscode, he has now found a home in the TECH UNDERGROUND which is DevOpsDays. He started Velocity because he couldn’t share all the secret stuff they were doing at Amazon but knew it was so important and crucial to the Web and thus the world… Sometimes frighteningly so.

DevOps, he says, is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. The right tools and culture are critical to doing this successfully.

The Internet is becoming pervasive.  Applications became customer service vehicles. Walmart and Amazon both understand this. Email killed the post office. These rips in the social fabric reveal something better. These changes are coming faster and faster and the technology that does this is ours – we build and run it.

Misaligned incentives cause conflict. “Operant conditioning.” People know what the good guys are doing, but they just can’t change themselves to do it – elephants can’t fly just by flapping their ears harder.

You can do your keep-it-small DevOps effort, but eventually you have to say “if we don’t do this everywhere we will fail” – that’s not a business or technology problem, it’s a culture problem.  He’s given this speech inside a bunch of organizations and knows how much resistance there is to change because they all wriggle around like itchy bear cubs when he says it.

Circuit City’s downfall and Blockbuster’s downfall due to Netflix are examples of cultures making agility impossible.  You can’t “agile out” of that. You can provide tools and culture but the overall foundation has to spread. True story, Blockbuster decided the brilliant way to get out of its death spiral was to buy Circuit City, which was also in a death spiral.  And “make it up in volume” I guess.  Is “being a meathead” a culture problem? I reckon.

Conway’s Law – you make things that are copies of your org structure.

Fundamental attributes of successful cultures:

  1. shared mission and incentives
  2. infrastructure as code
  3. application as services
  4. dev+ops+all as teams

Successful practices:

Full stack automation, commodity hw or cloud, reliability in the software, infrastructure APIs, code infra services – infra as product, app as customer.

Service orientation. versioned APIs, resiliency (design for failure), storage abstraction, push complexity up the stack, deep instrumentation

Agile, trust basis, shared metrics and monitoring, incident management, service owners on call, tight integration (maybe you end up with dedicated network or sec oncall, like SREs,  but at the core still collaborative), continuous integration, SRE/SRO to spread concepts, game days.

It takes time – amazon.com didn’t switch to EC2 till Nov 10, 2010.

Changing culture:

  • Start small, build trust & safety
  • Create champions
  • Use metrics to build confidence
  • Celebrate successes
  • Exploit compelling events – cause moments of openness

Continuous Quality: What DevOps Means for QA

By Jeff Sussna, @jeffsussna.

Old definition of quality – “does the software meet the spec?” But agile is about delivering value and cloud is about turning software into services. New definition of quality is “does the service help customers accomplish their jobs-to-be-done.”

A restaurant isn’t just about food delivery, there’s a lot of value creation in the whole chain. A service needs functionality, operability, deliverability, coherency (does it engage me throughout my journey).

New approaches include user-centered design, test-driven development, continuous delivery, MTTR over MTBF; build in testing and learn from failure.

So QA changes. Boundaries blur and automation takes over the manual activities. New and more valuable role: represent the “service not software” perspective, watchdog those 4 attributes.

QA engineers need to lift their gaze above the mechanics of testing, treat tests as code, focus on building quality into the system (quality advocate). New skills include understanding and thinking about service (e.g. outage comms), ops (sec, monitoring), process/automation

[Ed: If we add all this devopsy stuff into the definition of done, QA should look at all of it.]

Good testers see systems and their prts (and gaps), ask probing questions, design good tests, engage that proficiency in design and test plan critiques.

So we need this new kind of testing as well as the old kinds so with continuous delivery how do we catch up?  And people give a lot of “buts” about automated functional testing, But there are frameworks and DSLs that allow you to make changeable, encapsulated testing.  And it’s a process problem – no one asks what it’ll take to test the system. Write code and tests together, commit them together.

Operability and deliverability need testing. Design for internal users too…

You still want QA (instead of obsoleting them) as attached to the customer and as an antidote to confirmation bias.

Continuous Quality – everyone is  testing all the time, quality infused, QA is a mirror for the organization. There is still specialization.

Shout out to @guidostompff of designinteams.com.

Is your team instrument rated?

By J. Paul Reed (@SoberBuildEng) – also see the podcast theshipshow.com!

Culture. Is it hugs and beer? No, it’s incentives + human factors.

Why aviation as a DevOps analogy? It progressed from craft to trade to science to industry. DevOps is in th e”late trade” phase of that development.

Incident response is good but the house is already on fire by then. In aviation there’s a lot of scale and you want to avoid the incidents in the first place.

Learning to fly – first, you learn visual flight rules. Use your eyeballs. Then you move to instrument flight rules – flying in the system.

Flying by instrument relies on standardization, communication (precision), expectations (responsibilities in a situation), remediation. It is not static, blindly relying on automation or process, or fun-verboten.

How to get there?  Define your current process even if it’s weird, focusing on operational requirements, derive primitives, define operational dictionary, and make sure the nonfunctional requirements) are owned.

Formalize roles, responsibilities. There should be clear transfer of control on who “has the ball.” Drill/train and delegate. Priority classes. Fly|navigate|communicate.

Understand your org’s limitations.

Holding patterns/WIP are bad because it adds chaos to the system.

Investigate outcomes. Should you have an external team investigate? “No blame” postmortems aren’t about not being a jerk and making people sad, but  because it’s very unlikely a failure is “one guy’s fault” and it’s a red herring to think so. [I made a lovely “Root cause is a myth” custom t-shirt at the con! -Ed.]

You should have a day-to-day operational model that accounts for incentives and the human factors that make people able to deliver on them.

Leveling Up a New Engineer in a Devops Culture; Healthy Sustainability

By Gary Foster and Mercedes Coyle from Scripps

You want to hire a new engineer, teach them “our way,” inculcate a devops mindset from the beginning, add good practices and training to the local labor pool, and pay it forward.

Identify needs and outcomes desired and get a mentor.  THEN go hire! They go to hackathons and stuff to hire. Incubators, boot camps (e.g. Hackbright).

And now the new engineer! She was looking for a place where she could get up to speed quickly, support and challenge but no hand holding, senior engineers to help a new engineer grow. She had basics skills in coding/testing/deploying and willingness to learn.

What to do on the job as a new person? Question what you don’t understand, avoid perfectionism, and speak up.

The mentor’s responsibility is patience, giving them responsibility like seasoned engineers, ask them for ideas, teach problem solving not syntax and don’t give the answer.

So train ’em, listen to ’em, form a cult around ’em. Take responsibility for bad habits.

Ignite Time!!!!

Adrian Cockroft of Netflix (@adrianco) on beer pineapples and bottlenecks. “Cockroft headroom plot” helps you see when there’s serialization due to a bottleneck.

Peco!!! @bproverb on how we have an incident driven culture and effectively reward failure.  “Actionable alerts” are reactive and often we have sparse bad data. Analyze and track close calls, reward for prevention. Spend time with your data. No need to theorize if you have data, you can track close calls and pursue root cause. Find analyst ninjas. Close call focused analysis.

David Hatten from UrbanCode/IBM. The positive powers of negative thinking. And nihilism. And criticism.  Somebody needs a nap. Read Be Nice To Programmers.

Chantell Smith from ITSM Academy (subbing in for Jayne Groll) – what is devops culture? a multicultural society of frameworks and tools and standards and whatnot. There’s evangelists and detractors. Need communication to get over the cultural divide. Use the “git r done” scrum, not just for devs. Pair with a kanban board. ITSM is still a good thing and not lame! Let’s get a common dictionary/vocabulary. [Ed: So Gene and co., stop slacking on the devops cookbook!]

Systems theory for enjoyment of AWS – read John Gall’s Systemantics/The Systems Bible. Systems in general work poorly or not at all. Complex systems are always broken somewhere. Some simple services don’t even really work… Start simple and working and grow to complex and working. @whirlycott from Stackdriver (Philip Jacob)

Openspaces

I attended three openspaces on Day 1 afternoon.

Women in DevOps

The first was on getting more women into DevOps/related tech jobs. There were a lot of people and so we didn’t get too deep into any specific area of that. It was noted that benefits and especially maternity leave were super important and a good thing to stress in your job postings. Also that women are likely to not apply to”you must know all these 20 things” job descriptions. Though I’ve known guy engineers that fall into the same trap.  “I only meet 19 of those, I’d best not apply.” Hint as a hiring manager – if you meet like half of the things, you’re well advised to put a resume in!

We churned a bit over the fact that largely, people are hiring by exercising their known-people networks, and since historically more engineers are male, that tends to be self-reinforcing. You can go deliberately look into female-tech boot camps and the like.

In the end, I think the core problem is that we’re working at a fast pace. We put out a job posting (or a call for DevOpsDays presenters, as was brought up as an example) and we look through the responses we get.  If there’s no responses from women, then we can’t include them. But to break through that, we have to take the time to deliberately reach out (and figure out where to reach out to).

There was some talk about “wiping identifying information off” but I think that’s a blind alley.  I’m personally more likely to interview/etc. a woman or minority for an engineering position to try to level the playing field, if all the resumes are “Candidate 26” then so much for that.  I mean, maybe it’s true there’s a lot of old school tech companies out there who are like “wimmen on the front lines with us? Never!” but I have to say I’ve never seen that.

My main takeaway was that we should probably get the female engineers we have on staff, ask them to super-plumb their social networks, and get their views on what aspects of job descriptions/interviews/work environments are or are not attractive to them and double down on it.

Where the Hell are the Product Managers?

The second was one I proposed, entitled “Where the Hell are the Product Managers?” DevOps is nominally about bringing Ops into the agile team that is already a mashup of Product, Development, and QA. But unfortunately, despite 10 years post-agile manifesto, I find that healthy PM embedding into the agile team is honored more in the breach than in the observance.  Furthermore, in terms of owning “nonfunctional” requirements, or God forbid, an entire platform-type product, they tend to not want to do that.

We had a good discussion; some people had good PM engagement and others didn’t. Few had success with PMs doing effective prioritization of nonfunctional requirements and most “platform teams” didn’t have a PM, though some did and reported that it was super awesome. In fact, Bryan Dove from here at Bazaarvoice talked about one team he worked where the designer and marketing person came to colocate with the team as well and it was very effective.

The main takeaway was to continue to try to push the agile practice of crossfunctional, embedded and ideally colocated teams, because the results are so much better. And if one needs to hire more X (PMs, Ops, whatever) so that there can be one per product team, do it.

Running a DevOpsDays

The third was about running a DevOpsDays event.  Since I helped run DevOpsDays Austin I went to that to share the love.  If you’re looking to run one, we’ve made our budget and planning docs and everything available for others to crib from. My short playbook is:

  1. Get around 8 people as organizers, from a mix of companies.  2 will punk out and the other 6 will be able to share the load.
  2. Find a venue, that’s the most important thing.  It’ll give you a capacity and whether you’re planning on charging. We did a free DevOpsDays and had a large (>30%+) no show rate, and then did a $120 DevOpsDays and had a small (<10%) no show rate.
  3. Don’t get fancy.  Nail the hard requirements and then if you get excess sponsor money, add on other goodies.  For DOD Austin we added a band and a movie and more swag and more snacks later as our bank account swelled, but we could have cut off after venue/internet/some food and been done with it.
  4. BMC loves to do the A/V and stream the event! But beware, once they leave after the morning events you’ll be without mikes and stuff.
  5. Patrick Debois has the usual schedule/format for you to use.
  6. Don’t worry about sponsor money, they’re lining up to pay you. It’s more important to set expectations – this isn’t a “high traffic sales leads” event and you don’t get the attendees’ emails – it’s better to send engineers than salespeople, you’re trying to affect influencers.

And that’s Day 1, expanded!

Leave a comment

Filed under Conferences, DevOps

Velocity 2013 Day 3 Liveblog: Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Srinivas Peri, Adobe and Alex Honor, SimplifyOPS/DTO

Adobe needed to move from desktop, packaged software to a cloud services model and needed a DevOps transformation as well.

Srini’s CoreTech Tools/Infrastructure group tries to transform wasted time to value time (enabling tools).

So they started talking SaaS and Srini went around talking to them about tooling.

Dan Neff came to Adobe from Facebook as operations guru from Facebook.  He said “let’s stop talking about tools.” He showed him the 10+ deploys a day at Flickr preso. Time to go to Velocity!  And he met Alex and Damon of DTO and learned about loosely coupled toolchains.

They generated CDOT, a service delivery platform. Some teams started using it, then they bought Typekit and Paul Hammond thought it was just lovely.

And now all Adobe software is coming through the cloud.  They are not the CoreTech Solution Engineering team – who makes enabling services.

Do something next week! And don’t reinvent the wheel.

How To Do It

First problem to solve. There are islands of tools – CM, package, build, orchestration, package repos, source repos. Different teams, different philosophies.

And actually, probably in each business unit, you have another instantiation of all of the above.

CDOT – their service delivery platform, the 30k foot view

Many different app architectures and many data center providers (cloud and trad). CDOT bridges the gap.

CDOT has a UI and API service atop an integration layer  It uses jenkins, rundeck, chef, zabbix, splunk under the covers.

On the code side – what is that? App code, app config, and verification code. But also operations code! It is part of YOUR product. It’s an input to CDOT.

So build (CI).  Takes from perforce/github to pk/jenkins, into moddav/nexus, for cloud stuff bake to an AMI, promote packages to S3 and AMIs to an AMI repo.

For deploy (CD), jenkins calls rundeck and chef server. Rundeck instantiates the cloudformation or whatever and does high level orchestration, the AMis pull chef recipes and packages from S3, and chef does the local orchestration.  Is it pull or push?  Both/either. You can bake and you can fry.

So feature branches – some people don’t need to CD to prod, but they sure do to somewhere.  So devs can mess with feature branches on dev boxes, but then all master checkins CD to a CD environment.  You can choose how often to go to prod.

Have a cool “devops workbench” UI with the deployment pipeline and state. So everyone has one-click self service deployment with no manual steps, with high confidence.

Now, CDOT video! It’s not really for us, it’s their internal marketing video to get teams to uptake CDOT.  Getting people on board is most of the effort!

What’s the value prop?

  • Save people time
  • Alleviate their headaches
  • Understand their motivations (for when they play politics)
  • Listen to and address their fears

Bring testimonials, data, presentations, do events, videos!  Sell it!

“Get out of your cube and go talk to people”

Think like a salesperson. Get users (devs/PMs) on board, then the buyers (managers/budget folks), partners and suppliers (other ops guys).

Leave a comment

Filed under Conferences, DevOps

Velocity 2013 Liveblog Day 3: Managing Incidents In The Wild

Managing Incidents In The Wild

Got here late! By Jonathan Reichhold (@jreichhold) from Twitter.

“Facebook is for useless posts, Twitter is for making fun of celebrities, and Instagram is for young people.” -My 11 year old

Step 2: Set Expectations

set expectations for times of failure–set communication methods, test your escalation tree

Be realistic & ambitious. Prioritize what can be fixed and fix it in its due time

Postmortems – improvement has to be part of the process.

Teamwork – management has to support site reliability as a feature, burn out your ops guys

Distributed systems fail – have to be robust against things that don’t happen “a lot” at small scale.  A 1 in 1,000,000 issue is EVERY DAMN MINUTE at scale. Design more robust

Large systems take time to design, stabilize in prod.

Don’t assume.  Be rigorous and vigilant.

Degrade gracefully, shed load

Don’t “learn bad lessons” from retrospectives like “never touch the X!”

Capacity planning – do it just in time but be realistic.  Figure out real buffers. “Facebook with their huge custom datacenters is all nice but that’s not us.”

Hardware has lead time. [Ed: That’s why it’s for punks]

This is a marathon not a sprint.  You have to keep yourself healthy or you’ll crash.  Maintain your systems and yourself.

Leave a comment

Filed under Conferences, DevOps