Author Archives: Ernest Mueller

About Ernest Mueller

Ernest is the VP of Engineering at the cloud and DevOps consulting firm Nextira in Austin, TX. More...

Trying To Record And Replay Browser Tests… FML

I’m working for a startup right now and we don’t have a huge excess of development staff.  Our devs have been implementing UI testing in Cypress, but we also need some wide cross-browser testing of our front end Angular apps – we’d already found a couple blocker bugs on Edge and IE largely by accident.  The devs are all busy devving, so I figured I’d take that on. I said, “Well, there’s products where you can just click to record a UI session and replay it in other browsers without writing a bunch of code, let’s try that out.”  Most everyone has a free trial nowadays so I could see which ones were best. Then the pain began.

Sauce Labs – Part 1

I had used Sauce in a previous life, when we had a bunch of Robot Framework/Selenium tests and I liked it.  So I went there first.  Unfortunately, they have no record/replay capability, verified by their support, so I moved on.

But I came back later because I had found that there was a Selenium IDE that’s a record and playback tester that you can integrate with Sauce by using selenium-side-runner.

Selenium IDE is very cool, its killer feature is that as it records it copies various ways to address an item on the screen – css, xpath, full xpath – and when it replays if the first one doesn’t work it tries the next one and if that works tells you “hey you should update this test.” That’s great because UI testing is shitty and unreliable at best, and once you have Angular generating ever-changing ids for elements it is even worse.  The only bad thing is you have to go add assertions in manually afterwards.

Screen Shot 2020-01-09 at 10.58.47 AM

So in fairly short order, I managed to get a reproducible Selenium IDE script that exercised our Angular app and works.  The app’s just like 7 screens of form fill, it’s not crazy.

Well, then I tried to save it as a “.side” project and feed it through Sauce by using selenium-side-runner, which is just:

npm install -g selenium-side-runner

selenium-side-runner –server <sauce-url> -c “browserName=’chrome’ version=’latest’ platform=’macOS 10.14′” ‘Paul Precision.side’

You get that sauce URL that has credentials embedded under User Settings/Driver Creation in their UI.

Unfortunately once I push it to sauce (starting on the same OS/browser, which you go get the tokens for from their Platform Configurator) – problems. The player is great, it shows the video (even live while testing) synced to the step taking place (unfortunately since I’m piping it in, it’s not showing the steps in the test syntax, but in raw Selenium execution syntax).

Screen Shot 2020-01-09 at 12.37.42 PM

I fixed most of them by going and changing selectors away from CSS to xpath, then sitting there iterating with chrome dev tools and the IDE trying new ways to use an item that works in chrome then works in selenium IDE and then… Doesn’t work on Sauce. I have gotten it 90% working but the last 10% is blocking me.

CrossBrowserTesting

Next I tried SmartBear’s CrossBrowserTesting.com. An all-in-browser recorder that worked great!  And then the replays didn’t work.  I messed with it a while and contacted their support, who said “Oh yeah it doesn’t do angular, it’s for static pages.”  So on to the next one. Who uses static pages, this is 2020?

The interface is nice enough, editable steps next to a running video (though not synced up).

Actually looking at it closer I bet I could do the same “edit all the locators” deal and try to get it to work but… My 7 day trial is over (a week shorter than the other options) so I guess I can’t try.  It didn’t do the nice multi-locator guessing Selenium IDE did but it does seem to have several options in a dropdown while I edit the tests, and the recorder is integrated into the offering so that’s nice – the UI was good overall. Unfortunately the super short trial and the presales support saying “Angular? Go away!” prevented me from really seeing if it can work for us.

Screen Shot 2020-01-09 at 12.38.18 PM.png

GhostInspector

Demoralized, I head to Twitter, and someone recommends GhostInspector.  You record it with a Chrome plugin and then replay in the browser – video, but then it shows the editable steps next to a screenshot showing % change from the last screenshot (the steps aren’t synced to the video, which would be better) .  You can do assertions while you’re recording.  And the replay works the first time and every time – Hallelujah!

Screen Shot 2020-01-09 at 12.54.11 PM

And then I look to set up cross-browser and discover they only support Chrome and Firefox, and to even do that in an automated manner you have to duplicate the entire test suite.  I was so disappointed, it worked perfectly otherwise.

Seriously y’all if you add more browsers I’ll pay you immediately for this.

EndTest

Determined to make this happen I find EndTest and, after verifying they support a full OS/browser matrix, try them. They also use a browser plugin recorder like GhostInspector.

I’ll be honest, the UX is terrible.  Besides the 1990s colored icons, everything is always a click away – you have to watch the replay video separately from looking at the logs from looking at the steps from editing the steps. Everyone, the magic combination is editable steps to the left, running video and logs to the right, highlight the step you’re on as it plays. Anything else harms your usability.  And also while editing steps you can’t add a step in just anywhere, you have to add it at the end of your 100 steps and then drag it up page by page…  And often when you do that you just get “error saving test” messages for no reason. Argh.

Screen Shot 2020-01-09 at 1.08.46 PM

But… The recording is quick and then it is semi working.  Tempting.  Now I start the iterative edit-replay-debug cycle.  It is slow. You get to give your steps a name but those names don’t show up in the test output, because why would they.  After an afternoon of fiddling, I’m halfway through a 7 screen flow. Their support was nicely proactive and reached out to me about an error (I was looking for text with a $ in it and you can’t do that, but you can define a variable and then use that…)

It’s at this point I also find the Selenium IDE and bring Sauce back into the mix.

Keep Trying – Sauce and EndTest

Next, what I was doing was fiddling with the steps in the Selenium IDE, then pumping those changes both into Sauce via CLI and manually editing them into EndTest’s UI, desperately hoping to get one to pass (they don’t act the same under the same inputs for whatever reason).

Locator by locator I grind through making the test work.  I have a lot of trouble where we use multiple option mat-selects, because they “stay open” while you select items and I can’t get them to close.  I try sending ESCAPE keys but can’t get that to work, I try double clicking on other things…  One of our devs figured out the magical thing to click on was the overlay backdrop (css=.cdk-overlay-backdrop) to close the damn multiselect box.

This takes several grueling days.  I ask support folks for help but don’t really get any useful traction.  Finally, I get a magic combination in the Selenium IDE that also works in Sauce!  I try the same ones in EndTest and they don’t work.

Screen Shot 2020-01-10 at 11.15.10 AMScreen Shot 2020-01-10 at 11.15.18 AM

It’s super frustrating.  The same locator doesn’t work in all 3 tools, often forcing me to choose a less portable option – instead of something resilient to change like “xpath=//span[contains(.,’Visual Line of Sight’)]” – which works in some cases – I end up having to use something like like  xpath=//mat-option[@id=’mat-option-87′]/mat-pseudo-checkbox (and sadly in angular material those IDs randomize unpredictably). Like, there will literally be two identical-except-for-the-text-and-ids-in-them widgets one after the another and one kind of locator works on the first one and not on the second. No idea why.

Sauce Labs – Part 2

OK, so of all the options the only one that actually works for me and will allegedly do crossbrowser testing is an unsupported combo of Selenium IDE and Sauce run off the command line.  A couple sources I found over the course of this:

Not optimal, but at this point I’m a week in and taking what I can get.  Let’s try an actual crossbrowser matrix now.  Bonus hacky Bash script:

#!/usr/bin/env bash

tests=("Paul Precision.side")

platforms=("browserName='chrome' version='latest' platform='macOS 10.14'"
        "browserName='chrome' version='latest' platform='Windows 10'"
        "browserName='chrome' version='latest' platform='Linux'"
        "browserName='MicrosoftEdge' version='latest' platform='Windows 10'"
        "browserName='safari' version='latest' platform='macOS 10.14'"
        "browserName='firefox' version='latest' platform='macOS 10.14'"
        "browserName='internet explorer' version='latest' platform='Windows 10'")

for test in "${tests[@]}"
do
        for platform in "${platforms[@]}"
        do
        echo Running "${test}" "${platform}" 
        echo
        selenium-side-runner --server https://<secrets>@ondemand.saucelabs.com:443/wd/hub -c "${platform}" "${test}"
done
done

Chrome on MacOS – works.  Chrome on Windows – works.  Chrome on Linux – for some reason can’t find a selector early on.  Edge on Windows – weird proxy 400 error, won’t even load the page.  Pretty sure that’s not my fault.  Safari on MacOS – can’t click on the first things it needs to click on.  Firefox on MacOS – same error?  Really?  Now IE… Out of minutes (despite the UI telling me .6 automated hours remain).

I have tried all these os/browser combos manually and they work.

So my conclusion is all these suck and I guess I just need to pay manual QA people to click on our app.  Great.  Or for Cypress to get off their butts and add cross-browser support, which they say “is coming” for three years now.

We’re a startup and time is money, so in the end cross-browser testing is not worth the hassle in all these solutions.  But it is important and I’d love someone to make a solution that actually works for it.

P.S. Please do not suggest another solution unless it has a) UI record and replay capability and b) is cross browser (Chrome/Firefox/Safari/IE/Edge on Windows/MacOS/Linux). I know there’s a million browser automation testing tools out there, that’s not what I need.

Update

I put some more time into this and got some working options – see Record and Replay Browser Testing, Take 2!

Leave a comment

Filed under DevOps

Ask A Tech Manager

We had a really interesting joint CloudAustin/Austin DevOps meeting this week – a tech manager panel!  Ask them anything!  We had 92 people attend and we had to herd them all out of the building with sticks at the end because everyone had so many questions that even at two hours it was still going strong.

My takeaways:

  • Understanding managers’ viewpoint and goals is critical to an individual engineer when looking to get hired or promoted or whatever.
  • Managers want you to be successful – you being successful vs you not being successful (and/or leaving or being let go) is a huge win for them in all ways.
  • Looking for a job?
    • Your resume should be tuned to a job and either through its top section or a cover letter tell a story. Hiring managers get 100-200 applications/resumes per job. They don’t expect you to have everything on the job application – “50% is fine” was the consensus. But they’re doing a first cut before they talk to you, and a lot of people spam resumes, so if your resume doesn’t clearly say why you, add something that does.
    • For a Cloud Engineer position, for example, there’s a big difference between a random UNIX admin resume and a random UNIX admin resume that says “I’m excited about cloud and working towards an AWS certification on the side and really want to find a new job I can do cloud in.” The first one is discarded without comment, the second one can move ahead if they’re willing for people to learn – and given the “50%” thing, they are generally willing for people to lean, if those people are willing to learn!
  • Interviewing for a job?
    • Everyone hates every different interviewing tactic (see Hiring is Broken, and we have the Ultimate Fix) – individual interviews, panel interviews. whiteboard design, online coding, takehome projects, poring over your resume, checking your github, looking at your social meda/blog/whatever. But in the end hiring managers are just trying a handful of things to see if they can figure out if you know what you need to know to do the job.
    • They can’t just take your word for it – their reputation is on the line when they bring people in, and you reflect on them. They want to understand if you’ll be successful. Recruiting, hiring, onboarding cost a lot of money so they can only take so much of a risk – they’re not real particular what the form “proof you can do the job” takes, they’re just fishing for something.
  • Performance issues?
    • If you’re having some outside problem, talk to your manager. People we work with are a cross-section of just plain people, and we expect every medical, psychological, marital, criminal, etc. issue to show up at some point.
    • Communicate. Again, it’s in their best interest you succeed. No one “wants to get rid of you.”
    • Unreasonable demands?  Communicate.  Lots of technical staff work long hours or do the wrong things because they don’t “manage up” well and communicate.  “Hey, with this change I have way too much to get done in 40-50 hours, this is what I think my priority list is, this is what will fall off, what can we do about it?” Bosses don’t know what you’re doing every minute of the day and can’t read your mind. I’ve personally worked with engineers burning themselves out while their same-team colleagues aren’t because they aren’t managing themselves – though they think it’s outside forces bullying them.
  • How to get ahead?
    • Understand the options.  What are career paths there?  How do raises and promotions work? What are the cycles, pools, etc – you can only work the system if you understand the system. Managers are happy to explain.
    • Communicate.  No one knows if you want to move into management or not, or feel like you’re due a promotion or not, if you don’t talk about it with them over time. You have to take charge of your own development – companies want to help you develop but a manager has N reports and a lot of things to worry about, they aren’t going to drive it for you.
    • Listen.  What is needed to get to that Lead Engineer position?  “Slingin’ code” and “being here for 3 years” are, I guarantee, not much of that list of requirements. Usually things like leadership, image, communication, and exposure have a role. Read “How To Win Friends And Influence People” and stuff if you need to. “I just grind code by myself all day” is fine but there’s a max level to which you will rise doing so.

Anyway, thanks to all the managers that participated and all the attendees that grilled them!  I hope it helped people understand better how to guide their own careers.

Leave a comment

Filed under Management

Why Blaming “Human Error” Is Wrong

I’ve been writing a LinkedIn Learning course on postmortems lately and digging into all the fun research on the topic (Dekker, Hollnagel, and so on), and deepening my knowledge on the things I hope most of you know (root cause is a myth, blaming “human error” is wrong…).

I came across an example that really brought it home to me why the continuous blaming of human error is wrong – not “mean,” not “unenlightened,” but just plain logically ineffective.

One of the classic examples of design choices contributing to aviation accidents is the similarity and close placement of landing gear and flap controls in an airplane cockpits. Pilots lower the landing gear, then when they land they pull the flaps – but a small miss has them retract the landing gear instead and they pancake in.

In fact, the US Air Force did a study at the close of World War II where they looked back at all kinds of “pilot error” crashes and identified a bunch of design problems that contributed, and the flap/landing gear confusion was #2 on the list, forming 16%

Analysis of Factors Contributing to 460 ‘Pilot-Error’ Experiences in Operating Aircraft Controls,” by P.E. Fitts and R.E. Jones, USAF Aero Medical Laboratory, Memorandum Report, July 1947.

Then, 20 years later, a major study on the same topic

Aircraft Design-Induced Pilot Error, National Transportation Safety Board, Department of Transportation, Washington, D. C., PB # 175 629, July 1967.
And then, another 13 years later, suddenly they realized the same thing about small craft.

Well, there were very simple fixes mandated to this problem early on – the FAA now requires, for example, the landing gear control be shaped like a wheel and the flap control be shaped like a flap, so basic visual and tactile feedback is available to distinguish the two (especially at night, under stress, etc.).

Screen Shot 2019-09-04 at 5.00.37 PM

There’s other simple tricks that greatly reduce these accidents, like putting a catch on the landing gear retraction. Suddenly the “human error” goes away (leading to the reasonable question of how broadly we define “human error…”

So why did this “known” problem persist – damaging planes and killing people I might add – for 33 years?  In fact, it still happens, here’s a lovely writeup from 2015.

Basically, because all of the accidents where this happened continued to be declared “pilot error.” When there’s not any significant further inquiry (which to be fair, bothers people like airlines and the government and aircraft manufacturers and people with money), just saying it’s pilot error gets the problem over with by a minor sacrifice (a pilot) instead of doing any harder work.

At my new job, we’re doing risk analysis and commercial insurance for unmanned autonomous vehicles (drones). It’s interesting to now be working in a space that’s actually closer than tech to all this safety research. And you can see the same things happening.

Sure, actual research shows that it’s technical problems, not really human error, that is the problem most of the time – see

News article: More drone crashes caused by technical glitches, not human error, study shows.

Study: Exploring Civil Drone Accidents and Incidents to Help Prevent Potential Air Disasters

It cites technical problems as 64% of the time and frankly doesn’t really distinguish human error from design-induced human error.

But of course you can still just blame human error.  I smelled something questionable in the recent news reports of how the crew was to blame for a UK Watchkeeper drone crash. Oh sure, the drone failed to land correctly so the crew intervened, so it’s their fault.  I suspect if they had not intervened and it had continued to malfunction it would also be “their fault.” Here’s the full Ministry of Defense writeup, which goes to the “loss of situational awareness” synonym for human error. And here’s a later Register article with more details, like “The most appropriate [flight reference card] drill …stated: ‘If UA [unmanned aircraft] not maintaining centreline axis: Engine cut……..Command’.” and that they were under supervision of contractors from the drone manufacturer. Apparently following the actual designated drill recommendation of cutting the engine still makes it your fault.

Screen Shot 2019-09-05 at 9.51.07 AM

Of course, “Five drones – almost 10% of a 54-strong fleet bought from French firm Thales – have been wrecked in mid Wales crashes.”  Apparently they have a lot of navigation problems.  So when one goes to land in a populated area and is clearly not navigating properly and goes off the runway and the crew cuts the engine…  The crash is ‘their fault.’  Riiiiiight.

It’s actually a fascinating question – when drone operations are more and more autonomous, how long can we just hold the “crew” responsible for anything that goes wrong? It’s cheaper and less embarrassing, so I’m betting a good while.

For us in tech this opens up an interesting discussion, beyond the obvious statement of “if you do an incident postmortem and simply write it off to developer or operator error you aren’t doing your job”.

We can’t always fix all design issues immediately.  At what point, though, does not prioritizing a better-than-bandaid fix become negligence? “Thirty years,” like with the flap/landing gear thing?

There’s a lot of legislation that tries to protect the powerful from lawsuits etc. – but as autonomy becomes more common, how long will that last for us? Technology firms have managed to get out of being held responsible for endemic security flaws (largely thanks to Microsoft) for decades.

You can see this beginning to crumble in aviation with things like the recent Boeing 737 Max crashes. “It’s human error!” declares the Boeing CEO.  But people aren’t that dumb and the Internet helps information get out that was previously inaccessible.  So the next tack is blame the software, but that’s also buck-passing… The software was the band-aid fix on top of the design issues.

When will our lovely band of insulation finally be whittled away in tech? Soon, I’d bet… “Oh sure let’s crank out some self driving cars, I’m sure it’ll be fine and we can just give the standard ‘what me worry’ face when our crappy design ends up killing some soccer mom that we use when we mess up a software patch nowadays.”

In the end, if you are motivated by actual safety, or uptime, or security instead of the CYA game of who to blame, you have to push beyond the nearest human, or the outermost band-aid on your Rube Goldberg system, to improve the system. You’re going to have to consider how your design and interfaces of your software and the tooling you use to operate it contribute. You’re going to have to use facts and numbers and not soothing opinions to say “you know what? That goes wrong more than our other systems – there’s something wrong with it, we have to dig in and figure out what.”

Leave a comment

Filed under DevOps

Community First! Village

2019-06-08 10.21.02

DoD Organizer Family Tour

DevOpsDays Austin sponsored this great charity this year with our proceeds, and the program is so cool I wanted to do a whole post on it.

Community First! Village “is a 51-acre master planned community that provides affordable, permanent housing and a supportive community for men and women coming out of chronic homelessness.”  It consists of 200+ micro-homes and RVs and supporting infrastructure, they’re at 78% of capacity already, and they are planning for another 300 homes to be built. They’re located in southeast Austin out near the Travis County Expo Center.

DCIM100MEDIADJI_0012.JPG

Aerial View of Village

And it’s really nice! The primary kind of residence are little mini-houses, 180-200 square feet in size, with electricity but no plumbing.  There are standalone bathroom buildings with individual lockable rooms. There’s kitchen buildings for more extensive cooking. There’s RVs, more expensive but better for those with medical problems. There’s a community garden (with chickens and bees), a store, a hairdresser, a garage, a forge, and more.  Heck, there’s a bus stop and an Amazon dropbox.

Here’s a series of pictures I took on our tour.

This slideshow requires JavaScript.

Austin has around 2200 homeless, and the number continues to rise. My parents visited me in Austin a couple months ago, and we went out and ate and they were shocked by how many were on the street, especially as we drove through the “shelter district” downtown. There are many efforts to help, but this is an approach I hadn’t heard of before, and wanted to share with everyone.

How Does It Work?

Donna Emery, the Director of Development for Mobile Loaves & Fishes, gave us a tour and told us all about it. She’d love any of you to come tour the village as well! Mobile Loaves & Fishes as an organization has been serving the homeless for many years, and this is their deeply considered idea at making a permanent difference.

The village isn’t a shelter; it’s intended to be permanent. They identify candidates for the village via social workers and the array of people trying to help the increasing homeless population (there’s a database they all use to track homeless clients and try to get them services and such).  The person says they want to get into the Village, and there’s an about 12 month runway program to get them ready and in.

There are three rules to living in the village.

  1. Have to pay rent. Micro-homes rent for $275-$375/month, the RVs more like $435. They work to ensure they have their social services and encourage “dignified income” working in the village or otherwise. 96% of the residents pay their rent on time, which is better than your average apartment building!
  2. Have to follow civil law. This isn’t “anything goes”, and safety is paramount. They don’t turn you away if you have a alcohol or substance abuse problem – you’re only going to get over that if you have housing – but crime isn’t allowed. It isn’t a major problem for them; homeless are generally the victims, not the perpetrators, of crimes (other than the criminalization of being homeless, of course). Applicants do have criminal background checks – they don’t disqualify you out of hand for having a record though, but don’t allow sex offenders and evaluate a past of violent crime carefully.
  3. Have to follow the rules of the community (like a strict HOA) – you have to care for your neighborhood. This isn’t a jungle, it’s a community. The place was very clean and well tended. (Pets are welcome, though! We spoke with a man walking his dog at length on our tour.)

Last year, residents earned $650k in “dignified income” – working in the gardens, crafting, doing maintenance, working in the garage and market…  You can make $900/mo from a job cleaning the community bathrooms, for example. Donna stressed that they don’t rely on handouts – it harms the dignity of the people and you don’t take care of things that are free. When a major tech company donated a bunch of tablets, they set up a monthly tablet rental.  “But those are free, we’re giving them to you, don’t make money off them,” they initially complained. But MLF explained that handouts are an unhealthy dynamic, and this way the renters respect the tablets – and themselves – more. They’ve put a lot of thought and experience into creating a place where communities and lives can grow for people that have had nothing.

Of course, they provide a lot of help, from social services to things like teaching them to use Netspend for money management.

Blue ribbon Austin business and organizations have donated a lot of the infrastructure to make this work – Alamo Drafthouse, HEB, Charles Maund, the Topfer family, and many more.

Really A Community

But the thing I found the most striking about this is that it’s really a community, and a part of the larger community around it.

40% of the residents are women. There have been two weddings so far among the residents and two residents passed away with their wishes to be interred in the Village. The average age of homeless coming there is around 50 and they’ve been chronically homeless for around 10 years. This isn’t an attempt at “give them a shower and shave and get them a job and send them back out into the wild,” this is a permanent home where they can belong as long as they want. Donna shared with us that what really makes persistent homelessness is some kind of crisis combined with a collapse of a person’s social relationships – no family, no friends to help. Being sent away from a community doesn’t tend to form better social support, does it?

From their FAQ:

It’s all about relationships. Mobile Loaves & Fishes desires to empower the community around us into a lifestyle of service with the homeless. We achieve this vision through Community First! Village by taking a relational approach for connecting with our homeless brothers and sisters, instead of a transactional approach. When we bring an individual into community with others, we truly begin to make a sustainable impact on their lives.

Mobile Loaves & Fishes believes that the single greatest cause of homelessness is a profound, catastrophic loss of family. That’s why our focus at Community First! Village is to do more than just provide adequate housing. We have developed a community with supportive services and amenities to help address an individual’s relational needs at a fraction of the cost of traditional housing initiatives. We seek to empower our residents to build relationships with others, and to experience healing and restoration as part of engaging with a broader community.

DCIM100MEDIADJI_0643.JPGThe businesses aren’t just for the residents – you can go there to the garage and pay to get your oil changed.  You can go attend their movie nights (the Alamo donated a projector) that are open to the public like any movie night in any park. They do things like a trail of lights during the holidays. There’s plenty of reasons for non-residents to go there, it’s not a “camp.” It’s just a subdivision, really, like any other one you’d drive through in Austin.

DCIM100MEDIADJI_0173.JPGHeck, you can go live there. 170 of the occupants are former homeless, but there are also many “mission families” living there with them to provide help and more strongly tie them into the social fabric of the Austin community.  Or you can rent spare homes on AirBNB!  They have a hall (“Unity Hall”) that can accommodate up to 300 and there’s a commercial kitchen attached (also staffed by residents) so you can host events there – we started seriously looking at it for smaller tech events. (More pics are in the slideshow above).

How Can You Help?

Let’s get real.  If you’re reading this tech blog you’re probably incredibly well off. Working for a company that’s incredibly well off. We have an embarrassment of riches in the tech scene here in Austin, living next to people with nothing. In DevOps we talk continually about collaboration, sharing, and community – one would think that our appetite for helping the less fortunate would go farther than just making sure you get an underrepresented person on your next tech panel.

You can help with funding.  Their Phase II capital campaign is building more homes and supporting buildings, a clinic, and more. Eventually they want things like dental care (an especially hard problem; it’s relatively expensive but dental problems unheeded turn into medical problems quickly). You can give, you can encourage your company to give. DevOpsDays Austin made spare money from sponsors, so we were able to put $25,000 into sponsoring one of the homes in their next phase.

You can help by volunteering. Persons or groups can email them and get set up to come help!  Get your church or other organization involved. They’ve had over 100 Eagle Scouts do their projects out there.

You can help by participating in your local government.  They had a long battle to be able to start the village and had to locate outside the City of Austin because of the never-ending NIMBY-ism of residents not wanting “those people” anywhere near them. Advocate for compassion and the homeless in your city council and other venues.

CFV_14_ResidentYou can help even by just going there, using the businesses, interacting with the residents to weave them into the fabric of Austin. Go on a tour to see what they’re doing out there. Bring your kids! We all had a great and deeply moving family outing in our visit to the Village.

1 Comment

Filed under Conferences, DevOps

Incident Management Course Coming!

2019-06-13 10.51.48I know we’ve been quiet on the blog, all four agile admins have been busy – several of us moved to new jobs, everyone has a lot going on.

But we’re still doing stuff!  I just went out to Carpenteria to film a LinkedIn Learning course on Incident Management.  The agile admins have a full DevOps curriculum on LinkedIn Learning (which was lynda.com); most of them are in the “Become a DevOps Engineer” learning path!  You can view them as a LIL member or they can be bought individually nowadays too.

We’ve done the 101 level (DevOps Foundation), the 201 level (CI/CD, CM/Infrastructure as Code, SRE, Monitoring and Observability, Lean and Agile) and now we’re hitting more details – Karthik’s done a bunch of Kubernetes and Cloud Native courses, Peco is doing more monitoring courses, James is doing DevSecOps courses…

2019-06-13 12.28.02And I just went and filmed an Incident Management course.  Incident Response, really, I’m hoping for a subsequent course that focuses on retrospectives (each class is only like an hour long and retros are a huge fun topic so I wanted to give them enough time on their own).

Pictured are my producers Adam and Lori and my live action director Julia (who’s also done some of my other courses!) This was a slides course (my first), but they have a program where they can add in a little live action, and since I’ve done it a bunch and Julia’s great we burned through a bunch of scripts in a short time on camera! Thanks to all of them (and my content manager Brian Anderson, not pictured).

The Course

I’ve been creating IM processes and training and leading organizations in them for a while now. A good incident response program removes friction and lets your smart technical staff focus on one thing, solving the problem, without having to worry about what to do otherwise. When I left AlienVault, the #1 thing people came and said to me in my 2 week notice period was “Hey, that incident management process, that’s really made a huge difference,” which is great to hear.

And it was a good opportunity to refresh on the newer developments in the field.  I first got into modern IM, which I defines as “derived from the Incident Command System”, in 2008 after I heard Brent Chapman speak at Velocity on Incident Command for IT: What We Can Learn from the Fire Department.  But (aside from retros) while that concept spread, for 5-6 years there wasn’t really a lot more in terms of new developments. Luckily that’s changed, and there’s been a lot lately. John Allspaw and J. Paul Reed have both done masters’ theses with Lund University’s Division of Risk Management and Societal Safety; there’s a new O’Reilly book Incident Management for Operations as well as IM being a hot topic in the Google SRE books, and so on. The REdeploy conference and Thai Wood’s Resilience Roundup weekly email newsletter and the Oncall Nightmares podcast re full of late breaking developments. (These sources and more are listed in the course handout!)

Special thanks to J. Paul for giving me guidance on the course content and giving me permission to use his and Kevina Finn-Braun’s Incident Lifecycle Model in it.

Expect video topics like:

  • Why Do I Need Incident Management?
  • The Incident Command System
  • Scoping the Problem
  • Your Incident Toolchain
  • Incident Toolchain Example
  • Detecting and Reporting Incidents
  • First Response and Escalation
  • Incident Communication With Your Users
  • Communicating Inside Your Organization
  • Best Practices for Diagnosis and Repair
  • Cleaning Up After
  • Continuously Improving
  • Training and Game Days
  • Implementation Challenges

Oh, and I got to use props for the first time (like that fire extinguisher in the lead pic), we threw some in for kicks. Fun!

The Experience

Speaking of that, I just wanted to give the LinkedIn Learning team a shout-out.  Making courses with them is a great experience, class all the way.  They are all super skilled at what they do and super friendly. Going to their campus/studio in Carpenteria, CA is always an exceedingly pleasant experience. Everything’s top notch, sound booths, live action studios… It’s not the average webcam tech course when you’re looking down the barrel of a camera with a director, a producer, and a sound/teleprompter person fussing over the fine details! If you are an expert in something (not just tech) and are interested in doing courses, I’m happy to introduce you to someone there; it’s all top quality.

And they treat their people well there!  As best as I can tell they always have, from when they were Lynda to when they were LinkedIn to now being owned by Microsoft. Lori confided in me, “I was a documentary filmmaker with a non-profit for years and I didn’t know jobs like this existed; I’ve never been treated so well.”

While I was there they were doing their monthly “InDay”, and apparently this is the most anticipated one of a year as it’s game themed. They had inflatable human foozball, arcade games, did up the cafeteria with a Stranger Things theme, even had a D&D training session.

 

2019-06-13 17.33.21And of course Carpinteria is beautiful, right on the beach, extremely temperate. It’s between Ventura and Santa Barbara, just north of LA. If you go out there, my hot tips are the nearby Shoals restaurant (a little down the 101) where you can get a table right on the water, and Chocolats du CaliBressan, a French chocolatier down in the far north end the beach side of Carpinteria. Oh and the booze is super cheap in the supermarket, so we always make some gin and juice and hang out in the Holiday Inn’s hot tub while we’re there…

 

2 Comments

Filed under DevOps

DevOpsDays Austin 2019 Retrospective

2019-05-02 12.49.54As mentioned, DevOpsDays Austin 2019 went off great!  And after the event, we sent out extensive surveys to attendees, sponsors, volunteers, speakers, and even the organizers to learn and improve. (Thanks to everyone who gave their feedback, we appreciate it!)

Last year we also did an extensive retrospective to figure out how we wanted this year to go, and this year’s event was driven by that feedback and our vision to make DoD Austin the place for practitioners to come, learn from each other, and build the local community.

Let me share this year’s retro with you – some of the numbers and sentiments are below with my thoughts. If you want the full details, sure, here you go!

Full DevOpsDays Austin 2019 Retrospective (pdf)

If you’re not familiar with a NPS score, it’s used to measure sentiment on a scale from -100 to +100.  When you get asked “would you recommend” something on a 1-10 scale, generally they’re taking that number and bucketing it into 1-6 being detractors (counted as negative), 7-8 being neutral, and 9-10 being promoters (counted as positive). Above 0 is “good”, above 50 is “excellent.”  See more about NPS scores here.

Sorry about the quality of the pics, these are basically ones I snapped myself on my iPhone. But hopefully they show some of what happened at the event!

Attendee Feedback (62 NPS, 50 responses)

2019-05-02 09.43.28

Damon Edwards

“Informative, laid back, friendly, humorous event. My favorite conference for a couple of years now.” 84% of attendees said they were likely to return.

The things people liked the most as measured by the freeform comments were the openspaces (9 comments), the speakers/talks, especially their diversity (8 votes), the culture/atmosphere of the event (5 votes), and the community and people (5 votes).

This makes me happy. DevOpsDays isn’t just “a conference,” it really focuses on building community – people meeting each other in a friendly and collaborative environment. The content is nice but it’s not the primary value of the event.

2019-05-02 09.48.15

Mandy Whaley

Concerns people had the most were “Nothing/great job” (10 votes), difficulty with travel and parking at the venue, including handicap access (6 votes), talks (6 votes), we want better lunches (4 votes).

Read on for more but we’re probably changing venues next year and will keep access in mind.  Now on the lunches – we used to have fancy lunches and they were a significant time and effort sink, with long lines, lots of time spent, and so on.  We moved to box lunches and now lunch goes fast and easy and leaves everyone more time to interact with each other.  We do not plan to ever change back from that, but we will see if we can get a BBQ place or something to do a nice lunch box.

(There were more likes and dislikes and we are evaluating action on all of them, but dang this post is going to be long already so I’m focusing on the top line items.)

Speaker Feedback (90 NPS, 10 responses)

2019-05-02 11.10.39

Pete Cheslock

  • “Everyone was really positive; welcoming, low-pressure environment.”
  • Experience – 50% excellent, 50% very good
  • Organization – 40% extremely, 50% very organized
  • Friendliness – 90% extremely, 10% very friendly

Likes: No tech problems/helpful techs/setup organized (x4), Supportive/welcoming (x3), Engaged audience (x3).  Dislikes: Chromebook support problem, schedule slippage, openspaces competing with Conversations talks.

Great overall, some things for us to tweak!  After several years in the same venue and buying a lot of gear, our crack AV team have the tech end of it pretty much down pat.

2019-05-03 15.20.05

Jon Loyens

Organizer Feedback (88 NPS, 8 responses)

  • “Just [wanted] to say how much I enjoy working with the crew and watching it all come together to put on a great event for the community. I get a lot out of doing it each year and see my contribution as an important way to give back.”
  • Time spent – 62.5% just right, 12.5% little long, 12.5% little short, 12.5% way too short
  • 93% likely to return (the one that isn’t pleaded a heavy year at work coming up)

Major likes included working together (x3), inclusion (x2), and the opportunity to give back (x2). Dislikes included some stressing out and looking for problems, and speaker notification happening late. There was good discussion about explaining openspaces more especially for the newer folks.

It’s important to me that our organizers have a good time too – my assigned domain on the organizer team is “Organizers” – besides working the master budget and schedule for folks, I facilitate and try to ensure that this volunteer gig is not onerous, and I’m happy we seem to be there.

2019-05-02 13.33.45

Deborah Hawkins

Volunteer Feedback (94 NPS, 17 responses)

  • Experience: 72.7% excellent, 27.78% very good
  • How much time you spend – 83% about right, 11% too much, 6% too little
  • 93% likely to return

We have a lot of volunteers from the community that come to slave away working the event for a free ticket and a couple meals, basically.  It’s very important to all of us that they have a good experience – these are the future organizers, and community members going above and beyond to give back to the community.  Boyd and Daria and the other organizers did a great job both organizing the work and making sure the volunteers had time to participate in the event and have a good experience – even given the storm-nightmare loadout at the end of the event. Thanks to all our great volunteers!

Sponsor Feedback (60 NPS, 10 responses)

  • “A++ highly recommend, etc. Y’all did a bang-up job putting this together, and the community is certainly a testament to your hard work and continuous efforts. I’ve told everyone at HQ that we need to learn from you.”
  • Experience – 70% excellent, 20% very good, 10% good
  • Liked: “Always a great event – excellent sessions, great opportunities to meet with customers and prospects.” Vendor area good. Friendly people and networking.
  • Disliked: Platinum sponsors were upstairs. Water bottles ran out. We want badge scanners. No day before setup. Only 1 minute blurb. Schedule off track. When will courtesy shipping be picked up.

2019-05-03 09.49.41So… Sponsors. For a number of years we kept expanding our sponsor offerings.  Then we realized the event had become too much of a traditional conference and we were spending lots of space, time, and effort on sponsors, when to be honest we don’t really need all that much money to put on the event.  Two years ago after a bunch of sponsor problems and everyone working themselves to the bone to provide professional conference services I did away with sponsor tables altogether. We let them back this year but really wanted to make the event not about that.  We also warn the sponsors up front this isn’t a “churn the leads” event, we want sponsors who are going to send technical people to engage with the community.

Did it work out that way?  Kinda. There’s too much expectation set up about what “conferences are like” and “DevOpsDays are like” and between the person purchasing the sponsorship and the people actually sent on site there’s a lot of room for expectations to drift.

2019-05-03 14.52.36

Tristan Slominski

I feel like there’s plenty of big conferences for that kind of sponsor engagement.  DevOpsDayses didn’t used to be like that, but as time goes on and they all grow it’s tempting to “improve” by making it more sponsor focused. We love sponsors who engage with the community but we consciously balance their participation in the event.

Funny story… Like I said we only let sponsor tables back on a limited basis this year. But there was a run on them, and we sold out of the ones we needed to fund the event quickly and had a bunch of sponsors still wanting to participate, including ones who had participated for  years. So we extended the sponsor room, just to let them participate, because we felt bad about excluding them. So we always sell out, so that’s probably a sign that we’re doing fine there.

And we got to sponsor a house for the homeless with the spare money, so that’s spiffy.

Recruiter Feedback (-50 NPS, 2 responses)

This is a new addition that didn’t work out so well. We had imagined a big recruiter speed dating thing. But few recruiters and attendees signed up for it so we pivoted into a recruiter fair.  It was during happy hour, but half the attendees leave before that. We had them by the bar, but the DevOps Trivia during the happy hour was also a big draw.

While all the recruiters rated their experience “good” they had low traffic.

So, sorry that didn’t work out. But I stressed to the organizers that this wasn’t a failure – if we don’t try new things that don’t work out sometimes, we’re not trying hard enough.

We’re one of the great grand-daddy DevOps events. We have years of experience, ample funding, and a big community.  Smaller DoDs, especially ones getting off the ground, often need to hew close to the “standard format” for a safe launch and to pay their bills.  We can afford to experiment, so I strongly urge the team every year to try different things.  It’s OK if we appeal to different sets of the community each year.  It’s OK to not do something again (even if it went well) and it’s OK to try new things as stretch goals. I kinda like putting how we run our event where our DevOps mouth is, so to speak.

This lets us try things out first. We were the first DoD with a multi-content track. We created the new “Conversations” talk format this year. We keep innovating, and sometimes there’s just not a fit given the constraints of venue, time, people, and so on. So this one didn’t go off great, but to me that just means we’re legitimately experimenting hard enough.

Ernest’s Retrospective Thoughts

Overall it went great!  Smooth, excellent execution by everyone involved. I feel like the Austin tech community is stronger for our event existing and that’s what I want out of it.

My main challenge personally this year was with the talks.

We really went into this year with an intent to curate the talks to a pretty specific practitioner format. DoD Austin has a bunch of years behind it so we don’t necessarily need the DevOps “talk circuit” talks to fill slots.  We feel like we can be very specific about the experience we want to curate – no repeat talks from other events (go watch them on the Internet, everyone posts videos!), some preference to local speakers, encourage diversity both in speakers and in content…  But we didn’t execute on that well.  We started using Papercall this year and it makes it easy for people to mass submit to multiple events – a great feature but somewhat antithetical to our needs. We had 200 submissions for 20 slots and had a lot of weeding to do and had to turn away a lot of folks. And while we had good talks, they didn’t fit our proposed theme necessarily.

We also just selected talks late, to where it risked people whose talks were declined not being able to attend because we sold out our attendee cap.

The second challenge was with openspaces.  In general the larger the event, the harder it is to make openspaces work. Once there’s more than 25 people in an openspace the format collapses and it’s just “2-3 people talking to each other and everyone else straining to hear,” basically a super crap panel talk. Putting them in the luxury boxes in the stadium worked really well there, because only so many people can fit into one, so it was a forcing function to keep them small enough to work. So they went well overall.

But some folks didn’t like them. Each year we get some feedback from folks more used to traditional content.  “Maybe we should get the openspace topics submitted before the conference so they’re already on the schedule!” No offense, but over my dead body. That’s not what openspaces are about and openspaces are the heart of DevOpsDays. They are for what the actual attendees want to talk about right then; the entire point is that they’re not programmed content. Early DevOpsDays were a couple talks and then pretty much all openspaces.  My general attitude is “if you don’t want to participate in openspaces, this is not the event for you.” We need to explain openspaces more ahead of time though, to seed ideas and get new people to understand the format.  Our experiment with mini-talks and then linked openspaces worked out great, I went to two of them and got high value out of them.

Next Year

A couple big changes are coming next year.

First of all, we’re probably changing venue.  We’ve enjoyed the stadium a lot, and love the staff there, but we’ve probably done as much as we can with the event in that particular form factor.

We’re considering going entirely to the new 20 minute talk format.  They were well received – if you really have more content than 20 minutes, a linked openspace is probably the best venue to explore it with highly engaged attendees!  And it’ll prevent people just submitting their “same talk” as much. We can also get more speakers in!

Also, we know it’s a bummer that we’ve been capping attendance and sponsors and that people who want to attend get turned away. So far we’ve felt like we have had to, both because of venue capacity but also to keep openspaces good and keep the great atmosphere and community and opportunities for engagement that make our event distinct.

Now that we have enough experience, we think we might be able to go bigger and still keep the small group and one-on-one interaction. We’ve all been to a bunch of conferences and seen other things – 1-1 mentoring table signups, for example, and other formats that facilitate it.  We’re also thinking about adding some “working groups” – opportunities to do something, produce position papers, whatnot, give the experts a really neat thing to do at the event.

And maybe even add on a third day, with all unstructured content. On a Saturday so people could bring their kids and stuff.

I wanted to just blaze big next year; the rest of the team loved the vision but reminded me how much burn-in there is on a new venue – getting A/V figured out, all the rough spots of a year one… So we may iterate into it, with getting a new venue and going slightly larger and trying out new engagement ideas next year, and then the year after saying “Big tent!  All are welcome!  Fly in for this one, no attendee or sponsor caps!” and making it a heroically sized event.

There’s no one right format for DevOpsDays – I encourage other organizers to keep experimenting as well.  Your event doesn’t have to be the same year to year; you can target different goals and audiences and sizes and such each time.

If anyone read this far, feel free and comment with your thoughts below! (Obligatory disclaimer, don’t tell me “well this isn’t right for my DevOpsDays” – that’s fine, none of this is to declare the “right” way to do an event, it’s just what is working for us in our community with our particular goals.)

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Austin 2019 Highlights

devops_mascot_texas_color_swapWe held our eighth DevOpsDays Austin last month! DevOpsDays Austin 2019 was held at the UT Austin stadium for two days full of talks, openspaces, and so on. All the videos of the sessions are up on YouTube in the DevOps Austin channel that holds other years’ videos as well.

Here’s my top 10 countdown list of great things about this year’s DevOpsDays Austin!

2019-05-03 09.49.41

Platinum Sponsor Suite

10. We brought the sponsor room back, and added platinum suites in the stadium luxury boxes so sponsors that wanted to hold sessions could do so. There were very well attended sessions in these suites!

9. We had two content tracks and a new “Conversations” talk format – a short 20 minute talk followed by a linked openspace for interactive demos and discussions and command line stuff that doesn’t do well in a talk session. We only had space for a handful of them but they were very highly rated and we’re considering shifting significantly towards them next year.

8. We made the happy hour more modest and onsite, but with DevOps Trivia from Patrick Debois!  We had a bunch of teams compete and it was a wild and woolly time. We even used Patrick’s zender.tv online trivia thing to let people outside the venue compete.

2019-05-03 17.58.58

The remnants of the cupcakes

7. Our fine venue, food, and drink team and vendors… We ripped into some mini cupcakes at snack time!!!

6. The openspaces.  I actually got to attend some this year instead of just running around working.  And they were all brilliant.

5. Our organizers! We bestowed the title of MVP organizer on two organizers this year – Daria Ilic for her great job with communication and Dan Zentgraf for doing a yeoman job with the sponsors.

Special thanks to all the DevOpsDays Austin 2019 organizers: James Wickett (Speakers), Peco Karayanev (Speakers), Karthik Gaekwad (Swag), Daria Ilic (Marketing, Volunteers), Dan Zentgraf (Sponsors), Tom Hall (Sponsors), Boyd Hemphill (Volunteers), Scott Baldwin (Web site), Lee Thompson (AV), Carl Perry (AV), Ian Richardson (Attendees), Chris Casey (Signage and Slides), Richard Boyd (Venue, Food, Happy Hour), Asif Ahmad (Venue, Food, Happy Hour), Bailey Moore (Venue, Food, Happy Hour), and thanks to Laura from ConferenceOps for doing all our finances.

4. I let the other organizers talk me into buying the Jumbotron!  I am naturally thrifty so had resisted given the significant price tag in previous years, but we had a glut of sponsors and everyone really wanted it so I finally gave in. Karthik even changed his Slack name to JUMBOTRON to petition for it. It remains so until this very day. You  have to respect the dedication. So behold – the DevOpsDays Austin Jumbotron! (Yes, that’s real, not Photoshopped.)

2019-05-02 09.46.00

3. Check out our cool organizer swag I got each organizer this year as a thank you gift – custom Vans with the DevOpsDays Austin mascot on them!  (They’re only $80, if a little work intensive to design on their site, feel free and steal the idea!) People always love our DevOpsDays Austin shirts so I wanted to give the organizers a really distinctive way to show their pride in the event.

vans

2019-05-02 09.48.152. A very special thank you to DevOpsDays Austin from Mandy Whaley and the Cisco DevNet crew, who have been sponsors and speakers and attendees for many years.  I wasn’t expecting this – they actually used their sponsor shout-out time to present us onstage with a heartfelt card that they read to the audience.

We appreciate everything that Mandy and the team bring to the event and the card was super touching.2019-05-02-09.49.56.jpg

2019-05-02 09.50.08-1

1. What could be better than that, though, you ask? How can such a kind shout-out be number 2 on the list?

Well, we had a little problem, and that problem was a spare $25,000 from letting in the gold sponsors above our initial sponsor room cap because they really, really wanted in and we felt bad for them. DevOpsDays Austin (like all DoDs) is a non-profit, so while we keep a war chest to pay for next year’s venue and stuff, the rest has to go. Previous years we did some modest donations to the Capitol Area Food Bank; last year we actually had enough spare money so that we let each organizer do a $1000 donation to a charity of their choice. But this was quite a larger chunk, so what to do?

Some of the organizers brought up a great opportunity they knew about and had given to themselves. Here in Austin there’s a really unique program going on, the Community First! Village – a planned community that provides affordable, permanent housing and a supportive community for men and women coming out of chronic homelessness.

mobile-loaves-fishes-community-first-village-microhome-300x200

Community First! Village Micro-Home

And it turns out $25,000 is how much is needed to build a micro-home in their next phase of expansion, to house a formerly homeless person in their community. These are little 180-200 square foot homes with electricity but no plumbing that are the foundation of their village. The whole organizer team got super excited about this opportunity.

So that’s what we did – we sponsored one of these homes to be built. We’re pleased to have the ability to help Austin in a permanent way out of the conference!

I’m going to do a separate blog post on this because it’s an awesome program that many companies in Austin have been getting behind, and it’s remarkably successful in helping our large homeless population. But thanks so much to all the sponsors and attendees that made this possible.

2019-06-08 10.21.02

DoD Austin Organizer (and Family) Tour of the Community First! Village

We had a great time at DevOpsDays Austin this year and hope many of you did too. Next, we’ll publish a full retrospective that we hope some of you and other DevOpsDays organizers will find interesting.

1 Comment

Filed under Conferences, DevOps

Want to be part of the DevSecOps Handbook?

The word is out, at RSA this week Shannon Lietz (@devsecops), James Wickett (@wickett), John Willis (@botchagalupe), and myself (Ernest Mueller, @ernestmueller) did a panel on our upcoming book, the DevSecOps Handbook.  We’re still writing it, and we want to make you a part of it!

Like the DevOps Handbook, also from IT Revolution Press, the heart of the book is case studies from practitioners like you.  Have you done something DevSecOpsey – adapted the culture of infosec/appsec to work better with your product teams, added security testing to your CI pipeline, added instrumentation and feedback loops for your security work, or other security-as-code kind of work?  Well, we want to hear from you!

We are interested in successes and failures, in both advanced implementation and people taking their first step – others will benefit from your experience in any of these cases.  You can be hardcore security dipping your toes into devops, hardcore dev or ops dipping into security, or someone getting started on the whole ball of wax. Don’t worry, we’re not asking you to write anything, we can interview you and do all the heavy lifting. Not sure if your company will sign off?  We can anonymize it, or if it’s been published publicly as conference proceedings or whatnot then journalism rules apply, we’ll just cite prior work.

To contact us, email book@devsecops.org or go to devsecops.org and fill out the form there. Or if you already know one of us, ping your favorite!

werereadytobelieveyou

We’re ready to believe you!

Leave a comment

Filed under DevOps, Security

SRE: The Biggest Lie Since Kanban

There is a lot of discussion lately about how SRE fits into or competes with or whatever-s with DevOps.  I’m scheduled to speak on a “SRE vs DevOps Smackdown” panel today here at Innotech Austin, and at the exact same time I see Bridget tweeting Liz Fong-Jones’ slides from Velocity on using SRE to implement DevOps. And the more I think about it, and see what people are doing, the more I’m getting worried.

The Big Lie

Just to get the easily provoked to put up their pitchforks, I don’t dislike SRE and I don’t dislike Kanban.  The reason I call Kanban a “Big Lie” is because really doing Kanban correctly and getting the value out of it requires even more discipline that doing something like Scrum.  But it looks so close to doing nothing new that many lazy teams out there say “they’re doing Kanban” and by that they mean they’re doing nothing, but they’ve turned on the Kanban view in JIRA for your convenience.  They have no predictability, they’re not managing WIP, they’re not identifying bottlenecks – they just have a visible board now and that’s it. I strongly believe from my experience that most teams “doing Kanban” are really doing mostly nothing.  There’s articles on this blog about how I make my teams I’m teaching Agile do Scrum first if they want to get to Kanban to build up the required discipline.  And I’m not just a crank, David Hawks from Agile Velocity just told our management team the same thing yesterday, which brought this back to mind for me and spurred this article.

Because I’m starting to see the same thing with SRE.  It’s not surprising – there was and is plenty of “DevOps-washing” of existing teams out there.  Rename your ops team DevOps, done. Well, at least DevOps was able to say “it’s a methodology not a job description or group name stop it” to force deeper thought – it’s why my team at work is the “Engineering Operations” team not “DevOps”, Lee Thompson insisted on that when he set it up! But SRE – yeah, it’s a team just like your own ops team, from an “org chart” viewpoint it looks the same. So doing SRE can – and in many shops does – mean doing nothing new. You just call your existing ops team SRE and figure you’re done.

A brief personal history lesson – my last job before DevOps hit was running the Web Systems team at National Instruments, an ops team.  That’s where we agile admins met, Peco and James were both ops engineers on that team! (Karthik was a dev we worked with.) We had smart people and did ops all right.  We had automation, monitoring, we had “definition of done” standards for new services. You wouldn’t have to squint too hard to just call that team a SRE team and call it a day. But, I wouldn’t wish that job on my worst enemy. It was brutal trying to do ops for just 4-5 dev teams, and that’s with business support, some shared goals, and so on. Our quality of life was terrible, we weren’t empowered, and no matter how hard we tried, success was always right out of our grasp. When we actually started a team using DevOps thinking at NI after that, the difference was night and day, and we actually began to enjoy our jobs as ops engineers. I would hate for anyone to deceive themselves into thinking they’re getting the goodness they should be able to get from a DevOps/”real” SRE approach while still just doing it the way we were doing it.

I have a friend at a local legal software firm, who told me they’re going through and just renaming all the QA folks to SWET (Software Engineer In Test), whether they can code or not, and all the Ops folks to SREs in this manner. One might be charitable and say they’re leaning forward and they intend to loop around and back that up with retraining or something, but… will they? Probably not, it’s just a rename to the hot new term without any of the changes to help those engineers succeed more in their jobs!

SRE isn’t “an implementation of DevOps” if you just apply it as a name for a hopped-up ops team.  Properly understood, it can be an implementation of one of the three parts of DevOps, Infrastructure As Code, Continuous Integration/Deployment, and Site Reliability Engineering. But note that reliability engineering doesn’t start with deploy to production; so much of it is Michael Nygard-esque techniques to write your app reliably in the first place; reliability engineering, in usual DevOps fashion, requires dev and ops work both way back in the dev cycle and out in production to work right. It doesn’t need to be a different team.  If it is, and that team doesn’t get to decide if it takes over ops for a given app, and it’s not allowed to spend 50% of its time on reducing toil and you’re not comping SREs like you do dev engineers – it’s not SRE and you’re a liar for calling it SRE. If you don’t keep DevOps principles in mind, you’re just going to get your old ops team with its old problems again.

That’s why SRE is a Big Lie – because it enables people to say they’re doing a thing that could help their organization succeed, and their dev and ops engineers to have a better career and life while doing so – but not really do it.  Yes, there have been Big Lies before, which is why I cite Kanban as another example – but even if the new criminal is pretty much like the old criminal, you still put their picture up on the post office wall.

Frankly, anyone pushing SRE that doesn’t put warning labels on it is contributing to the problem.  “Well but it mentions in chapter 20 of the second book,” said someone responding to the first version of this article on Twitter.  Not good enough. If something you’re selling is profoundly misused it’s your responsibility to be more up front about the issues.

The Little Problems

Now there are legitimate issues to have even with the “real SRE” model, at least the way that it’s usually being described.  The Google books kinda try to have it both ways, describing it as an engineering practice (how I describe it above and in the SRE course I did for LinkedIn) and describing it as “a team that works this way.”  Even among those not SRE-washing classical ops, the generally understood model is that SRE is a org/job title for a production operations team.

There’s an issue here, the problem of specialization.  If you are Google scale, well, then you’re going to have to specialize and a separate ops team makes sense.  But – first of all, you are not Google scale.  In my opinion, if you are under 100 engineers, you are committing an error by having a separate ops team. You need your product teams to own their products. Second of all – I don’t want to make an enemy of all the lovely Google engineers out there, but is your experience with Google services that they evolve quickly and get better once they go to wide release?  It’s not mine.  They rot.  Have you used Google Hangouts lately without it ending up with cursing and moving off to someone’s Zoom? That kind of specialization still has its downsides in terms of hindering your feedback loops that let you improve (the Second Way). Is SRE just Google-ese for “sustaining?”

I get that the Google folks say they still get feedback and innovation using the SRE model, I’m sure they do and they work hard at that, but that doesn’t change the fact that running a separate ops team is making a deliberate tradeoff between innovation and efficiency. There is no way in that you get as much feedback or improve as quickly with a separate team, you can compensate for it, but you’re still saying “look… Not as important.” Which is fine if that’s your situation, I worked at many companies with 200 abandoned apps in production and you had to do something.  But “not getting there in the first place” is better.

Some of the draw of the model, and why Google is highly aligned with it, is Kubernetes itself. k8s is very complex to run drives people back a little bit to the old priest-in-the-tabernacle model of “someone maintains the infrastructure and you write the app and then you have them deploy it,” but now there’s some standards (like deploying as a container) that make that OK – I guess? But if you think reliability, and observability, are the primary responsibility of an ops team that is not involved in constructing the application, you either have deep and profound company standards that allow seamless plugging of the one into the other or you’re fooling yourself. 90% of you are fooling yourself.

At this conference I heard “Service meshes!  They get you observability so your devs don’t have to think about it.” Do you not see how dangerous that mindset is?

SRE, as interpreted as “a separate newfangled ops team,” may work for some but you need to be realistic about the issues and tradeoffs you’re making.  Consider whether product teams supporting their product, maybe with aid from a platform team making tooling and an enabling/consulting/center of excellence team that can give expert advice?  DevOps helped us see how the “throw it over the wall from dev to ops” model was profoundly harming our industry.  Throwing over the wall from dev to SRE doesn’t improve that, it’s profoundly regressive. Doing SRE “right” to compensate for this, like doing Kanban right, requires more skill and discipline, not less – be realistic about whether you have Google levels of skill and discipline in your org, eh?

Conclusion

SRE (and Kanban) aren’t bad, they have their pros and cons, but they are easy to “pretend to do” in some minimal, cargo cult-ey way that gets you little of the benefits. And if you think spinning up an ops team and calling it SRE is “an implementation of DevOps” you’ve swallowed the worst poison pill the DevOps talk circuit can deal to you.

15 Comments

Filed under DevOps

DevOpsDays Austin 2018 Videos Posted

Well, we were “unplugged,” but we managed to smuggle videos out anyway for your pleasure… Watch ’em, like ’em, comment to the speakers that you appreciate them giving to the global tech community!  Especially since this year they weren’t pre-selected, voting on talks was done at the event, so these folks prepared a talk but weren’t for sure to give it, which takes guts!

Leave a comment

Filed under Conferences, DevOps