Keep Austin Agile 2018 Trip Report

This Thursday, both myself and my boss (the SVP of Engineering at Alienvault) went to Keep Austin Agile, the annual conference that Agile Austin, the local Austin agile user group network, puts on!  I used to run the Agile Austin DevOps SIG till I just ran out of time to do all the community stuff I was doing and had to cut it out.

Logo-Tagline.2376.v2017.08.16

It’s super professional for a practitioner conference, and was at the JW Marriott in downtown Austin one day only.  It was sold out at 750 people. I figured I’d share my notes in case anyone’s interested.  All the presentations are online here and video is coming soon.

DevOps Archaeology

My first session was DevOps Archaeology by Lee Fox (@foxinatx), the cloud architect for Infor. The premise is that it’s an unfortunately common task in the industry to have to “go find out how that old thing works,” whether it’s code or systems or, of course, the hybrid of the two.  So he has tips and tools to help with that process.  Super practical.  Several of my engineers at work are working on projects that are exactly this. “Hey that critical old system someone pooped out 3 years ago and then moved on – go figure it out and operationalize it.”

I basically wrote down the list of cool tools that help with this process…
  • Codecity – visualizes your code as a city
  • Gource  – visualizes the evolution of your codebase over time
  • Signaturesurvey – scan for patterns in code
  • Logstalgia – visualizes historical traffic to a Web endpoint
  • Proxies – setting up proxies helps understand what’s going on, at an even deeper level than flow logs.
  • Monitoring – you know, all the usual monitoring tools.
  • Logs – you know, all the usual log aggregation tools.
Then he had a bunch of AWS-specific tools too.  All our stuff is in AWS, so super useful.
  • Cloudtrail – AWS API logs, yeah.  We pump our cloudtrail into our own USM Anywhere instance to report on weirdness.
  • Config – new service, have it report on things not tagged right, if volumes are encrypted, whatever kind of rules you want to set up.  Nice!
  • Trusted Advisor – well, don’t trust it too much, I’ve learned the hard way there’s lots of limits and stuff it doesn’t know about.  But useful.
  • Macie – “machine learning” (I always put that in scare quotes nowadays because of its overuse) to identify weirdness in your environment. Detect high risk cloudtrail events, unusual locations of activity, and so on.

And, some discussion of testing, config management, and so on.  Great talk, I will look into some of these tools!

Brewing Great Agile Team Dynamics

This talk, by Allison Pollard (@allison_pollard) and Barry Forrest (@bforrest30), wasn’t really my cup of tea. It did a basic 4-quadrant personality survey to break us up into 4 categories of Compliance, Dominance, Steadiness, or Influencer.  Then we spent most of the time wandering the room in a giant circle doing activities that each took 10 minutes longer than they needed to.

So I’m fine with the 4 quadrant thing – but I got taught a similar thing back when starting my first job at FedEx back in 1993, so it wasn’t exactly late breaking news.  (Driver, Analytical, Amiable, and Expressive were the four, IIRC.)  As a new person it was illuminating and made me realize you have to think about different personalities’ approaches and not consider other approaches automatically “bad.” So yay for the concept.

But I’m not big on the time consuming agile game thing that is at lots of these conferences. “What might turn you off about a Dominant person?  That they can be rude?” Ok, good mini-wisdom, should it take 10 minutes to get it? Maybe it’s just because I’m a Driver, but I get extremely restless in formats like this. A lot of people must like them because agile conferences have them a lot, but they’re not for me.

Modern Lean Leadership

Next up was Modern Lean Leadership by Mark Spitzer (@mspitzer), an agile coach. I love me some Deming and also am always looking to improve my leadership, so this drew me in this time slot.

First, he quoted Deming’s 14 points for total quality management.  For the record (quoted from asq.org:

  1. Create constancy of purpose for improving products and services.
  2. Adopt the new philosophy.
  3. Cease dependence on inspection to achieve quality.
  4. End the practice of awarding business on price alone; instead, minimize total cost by working with a single supplier.
  5. Improve constantly and forever every process for planning, production and service.
  6. Institute training on the job.
  7. Adopt and institute leadership.
  8. Drive out fear.
  9. Break down barriers between staff areas.
  10. Eliminate slogans, exhortations and targets for the workforce.
  11. Eliminate numerical quotas for the workforce and numerical goals for management.
  12. Remove barriers that rob people of pride of workmanship, and eliminate the annual rating or merit system.
  13. Institute a vigorous program of education and self-improvement for everyone.
  14. Put everybody in the company to work accomplishing the transformation.

His talk focused on #7 and #8 – instituting leadership and driving out fear.

Many organizations are fear driven. Even if it’s more subtle than the fear of being fired, the fear of being proven wrong, losing face, etc. is a very real inhibitor.  Moving the organization from fear to safety to awesome is the desired trajectory.

He uses “Modern Agile” (Modernagile.org) which I hadn’t heard of before, but its principles are aligned with this:

  • Make People Awesome
  • Make Safety a Prerequisite
  • Experiment & Learn Rapidly
  • Deliver Value Continuously

So how do we create safety? There’s a lot to that, but he presented a quality tool to analyze fear and its sources – who cares and why – to help.

Then the next step is to determine mitigations, and how to measure their success and timebox them. I’m a big fan of timeboxing, it is critical to making deeper improvement without being stuck down the rabbit hole.  I tell my engineers all the time when asked “well but how much do I go improve this code/process” to pick a reasonable time box and then do what you can in that window.

OK, but once you have safety, how do you make people awesome? Well, what is awesome about a job?  Focus on those things.  You can use the usual Lean techniques, like stop-work authority, making progress visible (e.g. days without an incident), using the Toyota kata for continuous improvement, using Plan-Do-Check-Act…

In terms of tangible places to start, he focused on things that disrupt people’s sleep at night, doing retros for fear/safety, and establishing metric indicators as targets for improvement.

Another good talk!

How The Marine Corps Creates High-Performing Teams

Andy McKnight gave this interesting talk – explaining how the Marines build a culture and teamwork, so that we might adapt their approach to our organizations.  I do like yelling at people, so I am all in!

Marine boot camp is partially about technical excellence, but also about steeping recruits in their organizational culture. (In business, new hire orientations have been shown to give strong benefits… And mentoring after the fact.)

What is culture?  It is the shared values, beliefs, assumptions that govern how people behave.

Most organizations have microcultures at the team level.  But how do you make a macroculture?  Culture comes first, teambuilding second.

  1. shift your org structure to align with the value stream instead of functional silos
  2. measure as a team

The 11 Marine Corps Leadership Principles:

  1. Know yourself and seek self-improvement.
  2. Be technically and tactically proficient.
  3. Develop a sense of responsibility among your subordinates.
  4. Make sound and timely decisions.
  5. Set an example.
  6. Know your people and look out for their welfare.
  7. Keep your people informed.
  8. Seek responsibility and take responsibility for your actions.
  9. Ensure assigned tasks are understood, supervised, and accomplished.
  10. Train your people as a team.
  11. Employ your team in accordance with its capabilities.

On the scrum team – those necessary to get the work done

The two Leadership Objectives – mission accomplishment and team welfare, a balance.

Discussion of Commanders Intent and delegating decisions down to the lowest effective level.

Good discussion, loads of takeaways. At my work I would say we are working on developing a macroculture but don’t currently have one, so I’ll be interested to put some of this into practice.

Agile for Distributed Teams

And finally, Agile for Distributed Teams by Paul Brownell (@paulbaustin). At my work we have distributed teams and it’s a challenge. Lots of stuff in the slides, my takeaways are:

  • People’s biggest concern – not understanding enough context, not sharing values
  • Use multiple communication channels – video, chat, email.
  • Get F2F time.  Quarterly.  Make it happen. Use ambassadors.
  • Expose the team to Other parts of the org, get users involved
  • Establish rules of engagement – hours, channels, etc. for clarity.
  • Teams will have local subcultures – make a space for shared learning, encourage lateral communication, emphasize early progress.
  • Use icebreakers in standups etc – something about your week
  • Teambuilding- slack channels, scavenger hunts
  • Sprint planning – one or two meetings? Involve the team.
  • Standups – try all on headsets to level the playing field for in room/out of room.
  • Online whiteboards
  • Retros – be creative, get written feedback ahead of time

All right!  4 of 5 sessions made me happy, which is a good ratio. Check out these talks and more on the Keep Austin Agile 2018 Web site!  It’s a large and well run conference; consider attending it even if you’re not an “agile coach”!

 

2 Comments

Filed under Agile, Conferences, Security

Released! SRE, Monitoring and Observability!

Well we haven’t had a lot of spare blogging time but we’ve been busy – the agile admins have been hard at work on some more LinkedIn Learning/lynda.com video courses to help make DevOps more accessible for the common man and woman.

lyndalinkedinJames and I did DevOps Foundations: Site Reliability Engineering, which explains our view of what SRE is, and its position in the DevOps arena. This rounds out our “DevOps 201” series, where we delve down into the three practice areas of DevOpscontinuous delivery, infrastructure as code, and SRE.

Monitoring is a big part of SRE, but too much to do in the scope of all one course – so at the same time, Peco and I filmed a companion course, DevOps Foundations: Monitoring and Observability!  Like the CD and IaC courses, this alternates theory with demos.

Along with the recent DevOps Foundations: Lean and Agile I did with Karthik, we feel like we’ve now completed a curriculum that can introduce you to all the major areas of DevOps and prepare you to grow from there. (Link to lynda.com playlist.) The guys are doing other courses with lynda as well, we’ve kinda gotten addicted to doing them!  You can check them all out by your favorite Agile Admin:

I love talking to the folks I meet at conferences and whatnot who have done the courses; let us know what other areas you’d like to get the Agile Admin learning treatment!

We’re not a consultancy or anything – just four practitioners here in Austin who love giving back to the community when we’re not doing our day jobs. We get some royalties per click from the classes, but other than that we don’t have anything to sell you. We got into this to help people make sense of the confusing DevOps landscape and we’ll keep doing it as long as it seems like it’s meeting that goal, so your feedback is needed to let us know if we should keep going and if so on what.

Leave a comment

Filed under DevOps

Monitoring and Observability

Ah, observability, the new buzzword of the day. Monitoring vendors aplenty are using the word, to basically mean “better monitoring!” You know, #monitoringlove not #monitoringsucks. Because monitoring doesn’t help with debugging and doesn’t have app instrumentation right?

Well, I have to say “bah” to that.  So here’s the thing.  I’m an electrical engineer by education, and I spent a lot of time working at National Instruments, an engineering test and measurement company.  You may be surprised to know these terms have actual definitions that don’t require Twitter arguments to discover.

Monitoring is an activity you perform. It’s simply observing the state of a system over a period of time.

Why do we monitor? For three reasons, in general.

  • Problem Detection – you know, alerting, or seeing issues on dashboards.
  • Problem Resolution – root cause and troubleshooting.
  • Continuous Improvement – capacity planning, financial planning, trending, performance engineering, reporting.

How do we monitor?  Well, that’s called instrumentation. You can instrument your systems and get CPU and stuff, you can use synthetic probes, you can use JavaScript bugs to get end user monitoring, you can emit metrics from applications, you can introspect services and apps via whatever parts are exposed (from JMX to nginx stats to sysdig traces), you can take network traces… (Some folks are similarly trying to redefine “instrumentation” to just mean application instrumentation, which is lame, and in defiance of the fact that application performance management tools that do app instrumentation have existed for decades.)

You can instrument metrics or events; metrics have certain sampling frequency and resolution…

So what is observability?  This isn’t a new term. It comes from system control theory. You know, the stuff that makes your A/C system and electrical plants and your car work.

Observability is a measure of how well the internal states of a system can be inferred from knowledge of its external outputs.

Observability is a property of a system. You can monitor a system using various instrumentation, but if the system doesn’t externalize its state well enough that you can figure out what’s actually going on in there, then you’re stuck.

So is observability hippy bullcrap?  No, of course not. In a DevOps world, it’s very important that the apps and systems concentrate on making themselves both observable and controllable (I leave it to the reader to research controllability, unless I get agitated enough to post about that too). Do you make yourself “easy to monitor”?

Externalizing custom metrics contributes to observability (you know, like with dropwizard metrics).  So does good logging.  So does proper architecture!  Take a system that sticks all kinds of messages into one message queue rather than using separate queues for separate types – the latter is more observable; you can more readily see how many of what is flowing through.  (It’s more controllable too, as you can shut off one queue or another.)

Making your system observable is therefore important, so that if you monitor it with appropriate instrumentation, you understand the state of the system and can make short or long term plans to change it.

While a monitoring tool can definitely contribute to this via its innovation in instrumentation, analysis, and visualization, in large part observability is a battle won or lost before you start sticking tools on top of the system. It’s very important to take it into account when designing and implementing services. No tool is going to “give you” observability and that’s the usual silver bullet fallacy heard from someone who wants to sell you something.

I’m not saying every vendor is using the term wrongly (in fact I just came across this New Relic post that is very well done), but I have to say I am less than impressed when common engineering terms are so widely misused and misunderstood widely in our industry.

Would you like to know more?  Peco and I are working on a new lynda.com course on monitoring and observability!  There’ll be real engineering, a broad canvas of the different kinds of monitoring instrumentation, tips on implementation and use… We’ve both been using and/or building monitoring tools for decades now so we hope to have some useful info for you.

1 Comment

Filed under DevOps, Monitoring

CNCF and K8s 101’s

I never make New Year’s resolutions, but I want to do something different for 2018!

One thing I’m learning a lot about is Kubernetes and the CNCF ecosystem around it over the past couple of years and often find myself having a hard time keeping up with ecosystem sometimes. There are almost weekly releases on the many projects, and getting started content for all the new tools and technology is hard to find.

So! I plan to do quick 101 blogs on different topics under the Container/Kubernetes/CNCF umbrella. My first blog article will be on Prometheus- The monitoring tool that integrates GREAT with k8s! It’ll be based on my GitHub code here: https://github.com/karthequian/prometheus-demo (shhh sneak peak).

But, I need your help! Give me a list of things you are confused about in the container space, or want more info on, and I’ll be happy to do the legwork on it!

So, give me input here, or on twitter!

1 Comment

Filed under Cloud, DevOps, k8s, Monitoring

Released! Learning Kubernetes and K8s: Native Tools

 

I’ve been working on the managed Kubernetes Engine at Oracle as described here by my StackEngine CEO Bob Quillin.

Being knee deep in the Kubernetes and CNCF ecosystem is very exciting, and it reminds me a lot of the early days of the Docker ecosystem. Kubecon in December had a lot going on with a plethora of projects and lots of vendors. In the future, I believe Kubernetes will be the defacto platform that many large enterprises will use as their orchestration and IT platform when they look to modernize their architecture. It is either all Kubernetes, or all cloud native, or serverless, or somewhere in between.

And speaking of K8s, my Lynda courses on Kubernetes just released! I had filmed them late last year at Lynda’s campus in Carpinteria, CA- Learning Kubernetes and Kubernetes: Native Tools!

Learning Kubernetes covers all the information you’ll need to get started using Kubernetes- the concepts, examples, install and everything you’ll need to get started rocking with k8s!

Kubernetes: Native Tools is a shorter course that covers the different tools available in the k8s ecosystem.

Let me know what you think- and, reach out if you have questions or issues! K8s might be overwhelming initially, but stick with it, and it’ll make your container management life so much easier!

9 Comments

Filed under DevOps

DevOpsDays Summit Austin 2018 – “DevOps Unplugged”

Hey all!  We’re starting work on next year’s DevOpsDays Austin – our seventh here in the ATX.  Many of you have come out to the event (or another of the great DevOpsDays around the world). Well, we have some changes in store this year!

Last year’s DevOpsDays Austin, “Monsters of DevOps” was bigger than ever and had a stadium rock theme – we had a huge venue,  all the DevOps VIPs we could pull down (including the first time all 4 authors of the DevOps Handbook managed to get together at an event), multiple content tracks, killer swag, great food, a hackathon, the best Happy Hour I’ve attended at a conference, we invited in and comped local user groups to give talks…  Part of our continuing trajectory to make DoDA more all encompassing and awesome.

But – every year we sit down and discuss vision before we launch into the conference.  What do we want to accomplish and why?  Who are we serving and why?  Why are we, personally, putting in huge amounts of unpaid work to serve the community? “Because it’s there and we did it last year” isn’t a good answer, so we like to really put some thought into it.

This time when we talked about it, first in our core group and then with the rest of the 2017 organizers, we realized that we’ve been concentrating on “bigger” but we’ve been putting more and more money and effort into the parts of the event that aren’t really of high DevOps value. Here in Texas, it’s easy to conflate bigger with better, since we’re both the biggest and the best!  But we’re not sure that’s right. Many of the more expert people we know here in Austin don’t really come out to the event any more, unless they are giving a talk or recruiting for their current gig.  Talks and openspaces have kept focused on introducing new people to DevOps, enterprise folks, “horses and donkeys,” and so on.

And as we talked, we said “Well – what do we personally get out of the conference nowadays as attendees?”  The answer was “not much.” Openspaces are huge and end up being a couple people talking.  Talks are either pretty familiar from the conference circuit or also designed for new folks.  We have more content but it’s more passive content, sit and watch.  It’s good for the newbies but not as much for the experienced folks.

We contrasted this to the first couple DevOpsDays we went to in Silicon Valley.  The first couple were just in a big auditorium at LinkedIn.  There weren’t any sponsor booths. More of the event was focused on the openspaces and interaction between the highly driven participants. We ate box lunches wherever we could perch in the parking lot outside – and swag was just a t-shirt.  Heck, the third one was in a weird abandoned building Dave Nielsen had access to, we had to carry our own chairs around to talks and the food and stuff was in a concrete-and-cage loading dock. But it’s those events we got the most out of.

Therefore, this year DevOpsDays Austin is going to go to what we call a “Summit” format.  We’re reducing the size of the event, and focusing more on local, motivated practitioners.  What does this mean?

  1. No sponsor tables.  We’d love sponsors to participate, but in recent years we’ve gotten more folks who have either just sent aggressive marketers, or sent people we enjoy and then locked then down behind tables. So we’ve come up with a sponsorship package that gets them exposure and value but lets them actually participate in the event.  Folks that just want to churn leads will self-select out.  The sponsorships are less expensive, and we’ll just have venue food etc. instead of premium.
  2. No preselected talks.  Well, OK, maybe we’ll have one keynote a day.  But I went to a ProductCamp here in Austin and they did something brilliant – they had a RFC but don’t do a final selection – finalists show up and the audience votes on what talks they want to hear (kinda like openspaces but more prepared).  This means people who say ‘well… I’ll come to your event if I can talk (or sponsor if I can talk, or…)’ will self-select out. You come because you want to be here, and you can give a talk!
  3. Smaller headcount.  We’re lowering the cap (including sponsors and organizers and volunteers) to 400. We’re going to get openspaces to be the kind of highly engaged discussions that make the so valuable.  We’re going to be up front with people that attendees are expected to engage.  DoD used to be the only thing around to learn from.  But now, if you’re an enterprise person that wants to have some DevOps talked at them – you have  variety of options now, like you can go to DevOps Enterprise Summit (also a great event), or to another DevOpsDays like the one in Dallas using the conference format, or one of a dozen events either completely DevOps or DevOps-tracked.  But for here in Austin this year, we need something where the unicorns can also have an event meaningful to them, so they can gather and refresh on what’s going on. Not to say only “unicorns” are welcome, but frankly we’d prefer people only come out if they intend to discuss, share, and engage; this will not be a passive-learning friendly event.
  4. No streaming.  Every year we put a lot of work and money into live-streaming and/or recording the event.  But it’s often problematic, and doesn’t get viewed a lot – there’s so much content out there now.  But even worse, we end up having to degrade the experience of real attendees around the requirements of broadcast – space, money, schedule, the presenter has to stay in a little box… So we’re not going to do it.  You want to participate – come out and participate.

But How Can This Work???

That was everyone’s initial reaction to this plan.  But that’s silly – it has worked.  We’re just doing things that DevOpsDays has already done, that ProductCamp has already done, and so on. It’s just not what’s become customary.  After the organizers had a little time for it to sink in, they all rallied behind it with a vengeance.

We’ve run the numbers and just the basic $200/head attendee fees can pay for the venue, basic food, and a shirt, even if we get zero sponsors.  (We won’t have zero sponsors, we just put our sponsor page up and someone bought in the first hour it was live.) As we get more funding we’ll pump up the event, but deliberately focus on the core experience of highly skilled techies learning from each other, instead of adding distractions.

How Dare You Dis My Format???

This is the format we’d like to try this year.  Other events will use other formats and that’s fine. Here at DoDA we try something different every year!  We were the first to have multiple content tracks (over the complaints of some purists).  We added a hackathon, we added a local user group track… Last year we went big with a vengeance, and it was cool.  Now we’re going to do more small and exclusive, and that’ll be cool.  Next year, it’ll be different. Whatever your event is doing, more power to you, don’t confuse us having a vision we believe in with us thinking you’re “wrong.”

Come on down!

We’d love to see everyone out at DevOpsDays Austin 2018!  Come ready to interact and share.  Come ready to give a talk, with the risk it won’t make.  Come sponsor your company, just you won’t have a table to lounge at. This change has gotten us excited about running our seventh DevOpsDays, and we bet you’ll love it!

11 Comments

Filed under Conferences, DevOps

Assigning Fault To Human Error Is A Human Error

fitz-1024x683

We all know from DevOps blameless retrospective wisdom that there is no such thing as a single “root cause.”  One of the most common root causes people like to assign blame to is “human error”.  Not to mince words, this is usually political, buck-passing CYA of the highest order.

I just read a great article on the recent U.S. Navy ship collision issues I wanted to pass on.  If you have been keeping up with the news, there has been a rash of Navy ships colliding with other ships causing fatalities. When you go Google it up, you see a whole bunch of “Navy attributes it to human error…”

But now go read this article, Something’s Wrong In The Surface Fleet And We’re Not Talking About It.  It’s written by Capt. Michael Junge, an experienced Naval officer. The TL;DR is that you can say “human error” all you want, fire someone, and call it case closed, but these accidents are a systemic amount of understaffing of Naval surface ships and massive undertraining and maintenance that is a leading indicator of even worse to come should an actual wartime deployment be necessary.

Even in engineering, we are tempted to push the problem down onto the person that made a mistake.  Fully engaging with the system that caused the need for the action that caused the mistake, the lack of validation that makes mistakes possible, and so on is hard thinkin’.  It is threatening when people point out flaws in processes and systems and code you had a hand in.  But the only way to actually improve your situation is to soberly assess what the actual contributors to issues are, and work towards fixing them.

4 Comments

Filed under DevOps