Author Archives: Ernest Mueller

About Ernest Mueller

Ernest is Engineering Manager at Verica, in Austin, TX. More...

Upcoming Agile Admin Stuff

We’re not being idle, there’s a lot going on here in lockdown.

  1. James is speaking at the DevOps Institute’s DevSecOps SKILup Day today, go check that out!
  2. Karthik’s going to be on the DroidDevCast podcast this Friday talking about testing!

  3. James, Karthik, and Ernest all are running the Modern Infrastructure track at All Day DevOps in November. Sign up now! And we have a couple talk spots free – if you think you have a super amazing talk, and especially if you are from an underrepresented group, hit one of us up (Twitter DMs are fine) and we look at getting you in.

Leave a comment

Filed under Conferences, General

Chaos In The Time Of Covid

“See, here I’m now sitting by myself, uh, er, talking to myself. That’s, that’s chaos theory.”
– Jeff Goldblum as Ian Malcom, Jurassic Park

Hello all!  I know it’s been a while between posts here at the Agile Admin. Everyone’s safe and healthy, though. We’ve been quiet but busy, and in a twist of happy fate many of us are working together again!

James, Karthik, Bill, and Ernest are all now at Verica, a chaos engineering startup founded by Casey Rosenthal and Aaron Rinehart. It’s 100% remote, with people from Portland to Portland, as we like to say (Maine to Oregon). We’re still working on the initial release of the product; my team (which includes Karthik and Bill) is working on the Kubernetes module.  Really interesting stuff!  Casey literally co-wrote the book on chaos engineering and it’s a fascinating bleeding edge part of systems engineering.  If we can lure Peco over to join us then we’ll have the entire band back together!

Community stuff has been quiet.  We had to fight to get our money back from the venue from cancelling DevOpsDays Austin 2020, but we finally prevailed there.  The CloudAustin user group is meeting virtually.  And James, Karthik, and I are running the Modern Infrastructure track for All Day DevOps in November. Karthik’s speaking at KubeCon in August. We’re doing light work on some LinkedIn Learning courses, though I have to admit with the additional burden of the pandemic I haven’t been getting a lot of side stuff done personally.

As we get on our feet with chaos experimentation and this dang lockdown eases up we hope to be back with our usual mix of technical insight and Texan cussedness!  Because let’s be honest, Twitter is where you go to get dumber.

Leave a comment

Filed under General

Pragmatic Pipeline Security

Check out agile admin James Wickett’s talk from DeliveryConf last month on adding security into your continuous software delivery pipeline!

Leave a comment

Filed under Conferences, DevOps, Security

How do I start with Kubernetes in the enterprise?

Fellow Agile Admin Karthik Gaekwad did this video interview late last year on k8s in the enterprise from his perspective as an Oracle Cloud devrel, and I thought I’d share it here!

1 Comment

Filed under k8s

Record and Replay Browser Testing, Take 2

Recently I reported, and I quote, “Bah!” from trying a bunch of record and playback cross browser testing options. To recap, we’re a startup, our devs are writing UI tests but we don’t have cross-browser testing, so I tried to find something where I could record and replay our pretty simple Angular UI flow and get some cross-browser testing on it without needing code. And I didn’t find anything that worked. But I got a bit more traction after that first run at it, so here’s part 2.

EndTest

The EndTest support folks looked into it and fixed my test so it worked. Some were alternate locator schemes.  Some were advanced options I am not sure how I would have figured out (to close those pesky multiselect dropdowns, you can’t just click on the overlay backdrop that comes up, you have to offset by some pixels…).  I am not a front end guy, I just fake it, so this is a little daunting, but their support seems able to help with tests so it’s doable.

I now have a working test, somewhat generated from the recording and somewhat generated by programming. The trial doesn’t allow other browsers by default but they turned them on for me.  With some light fiddling I got them all running, and the only issue was IE not running because it didn’t like that pixel-offset workaround from the above and EndTest fixed it by changing my test to hit “enter” instead – fair enough.

I then also made a test for the second part of our flow that has a PDF that needs validation, you can do it with a screenshot comparison.

So after help from their support, I have a working test suite – it’s a stretch to say it’s pure record and replay; it’s record, edit the locators a bunch, and replay, but since my last iteration on this was “none of these solutions work and do crossbrowser”, that’s a big win.

Sauce Labs

Last time I had been trying to use the Selenium IDE for record/replay and integrate with Sauce for the crossbrowser testing. They got back to me and extended my trial minutes and gave me some tips on how to get some of the tests working.  They don’t support the Selenium IDE however so they don’t really help with the tests per se.

I had a weird experience, there were intermittent (but frequent) timeouts from running tests against Sauce with selenium-side-runner, just at some point 0-4 minutes into the test it’d hang and time out and give me a “ETIMEDOUT connect ETIMEDOUT 66.85.49.22:443″.  

And then I got into the lovely rich set of test things that don’t work the same across the browsers.  Oh, in Edge “the element isn’t clickable because it’s obscured.”  In Firefox on Mac “that radio button isn’t clickable because the inner part of the radio button obscures it.” All different elements.  The problem is, fiddling with the locators to get something that browser likes breaks it in the other browsers.  But the app works in those browsers, just not the tests.

Anyway, Sauce support said “we can’t support Selenium IDE questions” so I guess that’s it.  (I don’t know that those timeouts when running tests on their service count as selenium IDE questions but whatever). I had hope when I found Selenium IDE that a record and replay with Sauce was feasible but it seems like it’s a starting point to seed your own Selenium code at best.

mabl

A sales rep from mabl wouldn’t let go of me till I tried their solution. So I did and it has a lot of promise.  It tries a bunch of different methods to find a locator automatically and self-heals the tests, which is great.  On the one hand the tests are slow; a 4 minute selenium test is a 6-8 minute mabl test. On the other hand it works!

It worked the first time in fact (Well… second, but it was because I went into a click frenzy trying to get the mabl trainer window and our Hubspot popup out of my way). I now have a working mabl test, though it is having one-minute timeouts and confusion on that same “click on the backdrop to get out of the multiselect dropdown” issue that’s a pain in all the other solutions as well, it does it but after a long timeout and it doesn’t self heal it for the next test run so it works but is slow. I hate multiselect dropdowns. Anyway, after a discussion with mabl support it turns out that it tries a bunch of ways to find a locator and then self heals if it found a better one, and then tries a bunch of ways to click/exercise that locator but doesn’t self heal from that hence the long timeout in my test.  I gave them the feedback “self heal that too, yo!”

OK so it’s working, how about cross-browser?  I add Firefox – it works.  IE and Edge are only on the “enterprise plan” but I ask them to add them to the trial, they do, I run them, and…  They work!  Safari… Well, a problem there, but mabl looks at it and thinks it may be them. I’m super impressed, of all the stuff I’ve tried only GhostInspector actually recorded and replayed without significant recoding and that was only Chrome/Firefox.  They’re still working on a fix as of “press time.” They also do PDF testing, which we need..

So it works great and looks good!  And they have a lot of cool dev integration and stuff to get into, it uses a branching model for the tests, you can run the tests locally…  Great looking ecosystem. You can’t choose like OS platform though, the Chrome and Firefox are just on Linux. At our current level of detail that’s fine.

Next step is discussing pricing (Web site just says “contact us”)… And unfortunately it’s way, way out of reach of a 10-person startup. The cross-browser and PDF testing are part of the “Enterprise plan” but even if I sweet talk them into putting those features in their lowest cost plan it’s still cost prohibitive for us while we’re in seed round.  I mean, it’s totally worth it because it works out of the box without fiddling around, if I were still at AT&T Cybersecurity I’d make someone spring for it for sure, but it’s an extremely significant pricing step above these other solutions and not at the value point for where we are right now.  Dang.

New Bottom Line

OK so my new bottom line is this.

  1. If you just want quickie Chrome/Firefox on Linux record and replay, GhostInspector works out of the box. Nice but I want more browsers, doesn’t quite fit my needs.
  2. If you want record and replay on various OS/browser combinations and are willing to do some re-coding and testing of it to make it work, EndTest does it. Fits my need with a little pain, and is affordable.
  3. If you want cross-browser record and replay without hand coding, without full platform choices but with a bunch of cool dev friendly extras, mabl does a great job – for a lot more money than the other options.  Best for my needs, but most expensive.
  4. The other options basically don’t work on Angular (Crossbrowsertesting), or possibly work but with intensive time investment that makes it not really record and replay IMO (Sauce + Selenium IDE).
  5. Though if you don’t need a bunch of CI-driven executions per day and don’t care about all the platforms you can probably just use the Selenium IDE to do 3 major browsers locally installed on whatever laptop you have yourself for free (Safari/Chrome/Firefox on Mac or Chrome/Firefox/Whatever Microsoft Is Pushing Today on Windows). Free but DIY.

So for us being on a seed round budget, EndTest is probably the best compromise of functionality and price at this point, YMMV of course.

1 Comment

Filed under DevOps

Trying To Record And Replay Browser Tests… FML

I’m working for a startup right now and we don’t have a huge excess of development staff.  Our devs have been implementing UI testing in Cypress, but we also need some wide cross-browser testing of our front end Angular apps – we’d already found a couple blocker bugs on Edge and IE largely by accident.  The devs are all busy devving, so I figured I’d take that on. I said, “Well, there’s products where you can just click to record a UI session and replay it in other browsers without writing a bunch of code, let’s try that out.”  Most everyone has a free trial nowadays so I could see which ones were best. Then the pain began.

Sauce Labs – Part 1

I had used Sauce in a previous life, when we had a bunch of Robot Framework/Selenium tests and I liked it.  So I went there first.  Unfortunately, they have no record/replay capability, verified by their support, so I moved on.

But I came back later because I had found that there was a Selenium IDE that’s a record and playback tester that you can integrate with Sauce by using selenium-side-runner.

Selenium IDE is very cool, its killer feature is that as it records it copies various ways to address an item on the screen – css, xpath, full xpath – and when it replays if the first one doesn’t work it tries the next one and if that works tells you “hey you should update this test.” That’s great because UI testing is shitty and unreliable at best, and once you have Angular generating ever-changing ids for elements it is even worse.  The only bad thing is you have to go add assertions in manually afterwards.

Screen Shot 2020-01-09 at 10.58.47 AM

So in fairly short order, I managed to get a reproducible Selenium IDE script that exercised our Angular app and works.  The app’s just like 7 screens of form fill, it’s not crazy.

Well, then I tried to save it as a “.side” project and feed it through Sauce by using selenium-side-runner, which is just:

npm install -g selenium-side-runner

selenium-side-runner –server <sauce-url> -c “browserName=’chrome’ version=’latest’ platform=’macOS 10.14′” ‘Paul Precision.side’

You get that sauce URL that has credentials embedded under User Settings/Driver Creation in their UI.

Unfortunately once I push it to sauce (starting on the same OS/browser, which you go get the tokens for from their Platform Configurator) – problems. The player is great, it shows the video (even live while testing) synced to the step taking place (unfortunately since I’m piping it in, it’s not showing the steps in the test syntax, but in raw Selenium execution syntax).

Screen Shot 2020-01-09 at 12.37.42 PM

I fixed most of them by going and changing selectors away from CSS to xpath, then sitting there iterating with chrome dev tools and the IDE trying new ways to use an item that works in chrome then works in selenium IDE and then… Doesn’t work on Sauce. I have gotten it 90% working but the last 10% is blocking me.

CrossBrowserTesting

Next I tried SmartBear’s CrossBrowserTesting.com. An all-in-browser recorder that worked great!  And then the replays didn’t work.  I messed with it a while and contacted their support, who said “Oh yeah it doesn’t do angular, it’s for static pages.”  So on to the next one. Who uses static pages, this is 2020?

The interface is nice enough, editable steps next to a running video (though not synced up).

Actually looking at it closer I bet I could do the same “edit all the locators” deal and try to get it to work but… My 7 day trial is over (a week shorter than the other options) so I guess I can’t try.  It didn’t do the nice multi-locator guessing Selenium IDE did but it does seem to have several options in a dropdown while I edit the tests, and the recorder is integrated into the offering so that’s nice – the UI was good overall. Unfortunately the super short trial and the presales support saying “Angular? Go away!” prevented me from really seeing if it can work for us.

Screen Shot 2020-01-09 at 12.38.18 PM.png

GhostInspector

Demoralized, I head to Twitter, and someone recommends GhostInspector.  You record it with a Chrome plugin and then replay in the browser – video, but then it shows the editable steps next to a screenshot showing % change from the last screenshot (the steps aren’t synced to the video, which would be better) .  You can do assertions while you’re recording.  And the replay works the first time and every time – Hallelujah!

Screen Shot 2020-01-09 at 12.54.11 PM

And then I look to set up cross-browser and discover they only support Chrome and Firefox, and to even do that in an automated manner you have to duplicate the entire test suite.  I was so disappointed, it worked perfectly otherwise.

Seriously y’all if you add more browsers I’ll pay you immediately for this.

EndTest

Determined to make this happen I find EndTest and, after verifying they support a full OS/browser matrix, try them. They also use a browser plugin recorder like GhostInspector.

I’ll be honest, the UX is terrible.  Besides the 1990s colored icons, everything is always a click away – you have to watch the replay video separately from looking at the logs from looking at the steps from editing the steps. Everyone, the magic combination is editable steps to the left, running video and logs to the right, highlight the step you’re on as it plays. Anything else harms your usability.  And also while editing steps you can’t add a step in just anywhere, you have to add it at the end of your 100 steps and then drag it up page by page…  And often when you do that you just get “error saving test” messages for no reason. Argh.

Screen Shot 2020-01-09 at 1.08.46 PM

But… The recording is quick and then it is semi working.  Tempting.  Now I start the iterative edit-replay-debug cycle.  It is slow. You get to give your steps a name but those names don’t show up in the test output, because why would they.  After an afternoon of fiddling, I’m halfway through a 7 screen flow. Their support was nicely proactive and reached out to me about an error (I was looking for text with a $ in it and you can’t do that, but you can define a variable and then use that…)

It’s at this point I also find the Selenium IDE and bring Sauce back into the mix.

Keep Trying – Sauce and EndTest

Next, what I was doing was fiddling with the steps in the Selenium IDE, then pumping those changes both into Sauce via CLI and manually editing them into EndTest’s UI, desperately hoping to get one to pass (they don’t act the same under the same inputs for whatever reason).

Locator by locator I grind through making the test work.  I have a lot of trouble where we use multiple option mat-selects, because they “stay open” while you select items and I can’t get them to close.  I try sending ESCAPE keys but can’t get that to work, I try double clicking on other things…  One of our devs figured out the magical thing to click on was the overlay backdrop (css=.cdk-overlay-backdrop) to close the damn multiselect box.

This takes several grueling days.  I ask support folks for help but don’t really get any useful traction.  Finally, I get a magic combination in the Selenium IDE that also works in Sauce!  I try the same ones in EndTest and they don’t work.

Screen Shot 2020-01-10 at 11.15.10 AMScreen Shot 2020-01-10 at 11.15.18 AM

It’s super frustrating.  The same locator doesn’t work in all 3 tools, often forcing me to choose a less portable option – instead of something resilient to change like “xpath=//span[contains(.,’Visual Line of Sight’)]” – which works in some cases – I end up having to use something like like  xpath=//mat-option[@id=’mat-option-87′]/mat-pseudo-checkbox (and sadly in angular material those IDs randomize unpredictably). Like, there will literally be two identical-except-for-the-text-and-ids-in-them widgets one after the another and one kind of locator works on the first one and not on the second. No idea why.

Sauce Labs – Part 2

OK, so of all the options the only one that actually works for me and will allegedly do crossbrowser testing is an unsupported combo of Selenium IDE and Sauce run off the command line.  A couple sources I found over the course of this:

Not optimal, but at this point I’m a week in and taking what I can get.  Let’s try an actual crossbrowser matrix now.  Bonus hacky Bash script:

#!/usr/bin/env bash

tests=("Paul Precision.side")

platforms=("browserName='chrome' version='latest' platform='macOS 10.14'"
        "browserName='chrome' version='latest' platform='Windows 10'"
        "browserName='chrome' version='latest' platform='Linux'"
        "browserName='MicrosoftEdge' version='latest' platform='Windows 10'"
        "browserName='safari' version='latest' platform='macOS 10.14'"
        "browserName='firefox' version='latest' platform='macOS 10.14'"
        "browserName='internet explorer' version='latest' platform='Windows 10'")

for test in "${tests[@]}"
do
        for platform in "${platforms[@]}"
        do
        echo Running "${test}" "${platform}" 
        echo
        selenium-side-runner --server https://<secrets>@ondemand.saucelabs.com:443/wd/hub -c "${platform}" "${test}"
done
done

Chrome on MacOS – works.  Chrome on Windows – works.  Chrome on Linux – for some reason can’t find a selector early on.  Edge on Windows – weird proxy 400 error, won’t even load the page.  Pretty sure that’s not my fault.  Safari on MacOS – can’t click on the first things it needs to click on.  Firefox on MacOS – same error?  Really?  Now IE… Out of minutes (despite the UI telling me .6 automated hours remain).

I have tried all these os/browser combos manually and they work.

So my conclusion is all these suck and I guess I just need to pay manual QA people to click on our app.  Great.  Or for Cypress to get off their butts and add cross-browser support, which they say “is coming” for three years now.

We’re a startup and time is money, so in the end cross-browser testing is not worth the hassle in all these solutions.  But it is important and I’d love someone to make a solution that actually works for it.

P.S. Please do not suggest another solution unless it has a) UI record and replay capability and b) is cross browser (Chrome/Firefox/Safari/IE/Edge on Windows/MacOS/Linux). I know there’s a million browser automation testing tools out there, that’s not what I need.

Update

I put some more time into this and got some working options – see Record and Replay Browser Testing, Take 2!

Leave a comment

Filed under DevOps

Ask A Tech Manager

We had a really interesting joint CloudAustin/Austin DevOps meeting this week – a tech manager panel!  Ask them anything!  We had 92 people attend and we had to herd them all out of the building with sticks at the end because everyone had so many questions that even at two hours it was still going strong.

My takeaways:

  • Understanding managers’ viewpoint and goals is critical to an individual engineer when looking to get hired or promoted or whatever.
  • Managers want you to be successful – you being successful vs you not being successful (and/or leaving or being let go) is a huge win for them in all ways.
  • Looking for a job?
    • Your resume should be tuned to a job and either through its top section or a cover letter tell a story. Hiring managers get 100-200 applications/resumes per job. They don’t expect you to have everything on the job application – “50% is fine” was the consensus. But they’re doing a first cut before they talk to you, and a lot of people spam resumes, so if your resume doesn’t clearly say why you, add something that does.
    • For a Cloud Engineer position, for example, there’s a big difference between a random UNIX admin resume and a random UNIX admin resume that says “I’m excited about cloud and working towards an AWS certification on the side and really want to find a new job I can do cloud in.” The first one is discarded without comment, the second one can move ahead if they’re willing for people to learn – and given the “50%” thing, they are generally willing for people to lean, if those people are willing to learn!
  • Interviewing for a job?
    • Everyone hates every different interviewing tactic (see Hiring is Broken, and we have the Ultimate Fix) – individual interviews, panel interviews. whiteboard design, online coding, takehome projects, poring over your resume, checking your github, looking at your social meda/blog/whatever. But in the end hiring managers are just trying a handful of things to see if they can figure out if you know what you need to know to do the job.
    • They can’t just take your word for it – their reputation is on the line when they bring people in, and you reflect on them. They want to understand if you’ll be successful. Recruiting, hiring, onboarding cost a lot of money so they can only take so much of a risk – they’re not real particular what the form “proof you can do the job” takes, they’re just fishing for something.
  • Performance issues?
    • If you’re having some outside problem, talk to your manager. People we work with are a cross-section of just plain people, and we expect every medical, psychological, marital, criminal, etc. issue to show up at some point.
    • Communicate. Again, it’s in their best interest you succeed. No one “wants to get rid of you.”
    • Unreasonable demands?  Communicate.  Lots of technical staff work long hours or do the wrong things because they don’t “manage up” well and communicate.  “Hey, with this change I have way too much to get done in 40-50 hours, this is what I think my priority list is, this is what will fall off, what can we do about it?” Bosses don’t know what you’re doing every minute of the day and can’t read your mind. I’ve personally worked with engineers burning themselves out while their same-team colleagues aren’t because they aren’t managing themselves – though they think it’s outside forces bullying them.
  • How to get ahead?
    • Understand the options.  What are career paths there?  How do raises and promotions work? What are the cycles, pools, etc – you can only work the system if you understand the system. Managers are happy to explain.
    • Communicate.  No one knows if you want to move into management or not, or feel like you’re due a promotion or not, if you don’t talk about it with them over time. You have to take charge of your own development – companies want to help you develop but a manager has N reports and a lot of things to worry about, they aren’t going to drive it for you.
    • Listen.  What is needed to get to that Lead Engineer position?  “Slingin’ code” and “being here for 3 years” are, I guarantee, not much of that list of requirements. Usually things like leadership, image, communication, and exposure have a role. Read “How To Win Friends And Influence People” and stuff if you need to. “I just grind code by myself all day” is fine but there’s a max level to which you will rise doing so.

Anyway, thanks to all the managers that participated and all the attendees that grilled them!  I hope it helped people understand better how to guide their own careers.

Leave a comment

Filed under Management

Why Blaming “Human Error” Is Wrong

I’ve been writing a LinkedIn Learning course on postmortems lately and digging into all the fun research on the topic (Dekker, Hollnagel, and so on), and deepening my knowledge on the things I hope most of you know (root cause is a myth, blaming “human error” is wrong…).

I came across an example that really brought it home to me why the continuous blaming of human error is wrong – not “mean,” not “unenlightened,” but just plain logically ineffective.

One of the classic examples of design choices contributing to aviation accidents is the similarity and close placement of landing gear and flap controls in an airplane cockpits. Pilots lower the landing gear, then when they land they pull the flaps – but a small miss has them retract the landing gear instead and they pancake in.

In fact, the US Air Force did a study at the close of World War II where they looked back at all kinds of “pilot error” crashes and identified a bunch of design problems that contributed, and the flap/landing gear confusion was #2 on the list, forming 16%

Analysis of Factors Contributing to 460 ‘Pilot-Error’ Experiences in Operating Aircraft Controls,” by P.E. Fitts and R.E. Jones, USAF Aero Medical Laboratory, Memorandum Report, July 1947.

Then, 20 years later, a major study on the same topic

Aircraft Design-Induced Pilot Error, National Transportation Safety Board, Department of Transportation, Washington, D. C., PB # 175 629, July 1967.
And then, another 13 years later, suddenly they realized the same thing about small craft.

Well, there were very simple fixes mandated to this problem early on – the FAA now requires, for example, the landing gear control be shaped like a wheel and the flap control be shaped like a flap, so basic visual and tactile feedback is available to distinguish the two (especially at night, under stress, etc.).

Screen Shot 2019-09-04 at 5.00.37 PM

There’s other simple tricks that greatly reduce these accidents, like putting a catch on the landing gear retraction. Suddenly the “human error” goes away (leading to the reasonable question of how broadly we define “human error…”

So why did this “known” problem persist – damaging planes and killing people I might add – for 33 years?  In fact, it still happens, here’s a lovely writeup from 2015.

Basically, because all of the accidents where this happened continued to be declared “pilot error.” When there’s not any significant further inquiry (which to be fair, bothers people like airlines and the government and aircraft manufacturers and people with money), just saying it’s pilot error gets the problem over with by a minor sacrifice (a pilot) instead of doing any harder work.

At my new job, we’re doing risk analysis and commercial insurance for unmanned autonomous vehicles (drones). It’s interesting to now be working in a space that’s actually closer than tech to all this safety research. And you can see the same things happening.

Sure, actual research shows that it’s technical problems, not really human error, that is the problem most of the time – see

News article: More drone crashes caused by technical glitches, not human error, study shows.

Study: Exploring Civil Drone Accidents and Incidents to Help Prevent Potential Air Disasters

It cites technical problems as 64% of the time and frankly doesn’t really distinguish human error from design-induced human error.

But of course you can still just blame human error.  I smelled something questionable in the recent news reports of how the crew was to blame for a UK Watchkeeper drone crash. Oh sure, the drone failed to land correctly so the crew intervened, so it’s their fault.  I suspect if they had not intervened and it had continued to malfunction it would also be “their fault.” Here’s the full Ministry of Defense writeup, which goes to the “loss of situational awareness” synonym for human error. And here’s a later Register article with more details, like “The most appropriate [flight reference card] drill …stated: ‘If UA [unmanned aircraft] not maintaining centreline axis: Engine cut……..Command’.” and that they were under supervision of contractors from the drone manufacturer. Apparently following the actual designated drill recommendation of cutting the engine still makes it your fault.

Screen Shot 2019-09-05 at 9.51.07 AM

Of course, “Five drones – almost 10% of a 54-strong fleet bought from French firm Thales – have been wrecked in mid Wales crashes.”  Apparently they have a lot of navigation problems.  So when one goes to land in a populated area and is clearly not navigating properly and goes off the runway and the crew cuts the engine…  The crash is ‘their fault.’  Riiiiiight.

It’s actually a fascinating question – when drone operations are more and more autonomous, how long can we just hold the “crew” responsible for anything that goes wrong? It’s cheaper and less embarrassing, so I’m betting a good while.

For us in tech this opens up an interesting discussion, beyond the obvious statement of “if you do an incident postmortem and simply write it off to developer or operator error you aren’t doing your job”.

We can’t always fix all design issues immediately.  At what point, though, does not prioritizing a better-than-bandaid fix become negligence? “Thirty years,” like with the flap/landing gear thing?

There’s a lot of legislation that tries to protect the powerful from lawsuits etc. – but as autonomy becomes more common, how long will that last for us? Technology firms have managed to get out of being held responsible for endemic security flaws (largely thanks to Microsoft) for decades.

You can see this beginning to crumble in aviation with things like the recent Boeing 737 Max crashes. “It’s human error!” declares the Boeing CEO.  But people aren’t that dumb and the Internet helps information get out that was previously inaccessible.  So the next tack is blame the software, but that’s also buck-passing… The software was the band-aid fix on top of the design issues.

When will our lovely band of insulation finally be whittled away in tech? Soon, I’d bet… “Oh sure let’s crank out some self driving cars, I’m sure it’ll be fine and we can just give the standard ‘what me worry’ face when our crappy design ends up killing some soccer mom that we use when we mess up a software patch nowadays.”

In the end, if you are motivated by actual safety, or uptime, or security instead of the CYA game of who to blame, you have to push beyond the nearest human, or the outermost band-aid on your Rube Goldberg system, to improve the system. You’re going to have to consider how your design and interfaces of your software and the tooling you use to operate it contribute. You’re going to have to use facts and numbers and not soothing opinions to say “you know what? That goes wrong more than our other systems – there’s something wrong with it, we have to dig in and figure out what.”

Leave a comment

Filed under DevOps

Community First! Village

2019-06-08 10.21.02

DoD Organizer Family Tour

DevOpsDays Austin sponsored this great charity this year with our proceeds, and the program is so cool I wanted to do a whole post on it.

Community First! Village “is a 51-acre master planned community that provides affordable, permanent housing and a supportive community for men and women coming out of chronic homelessness.”  It consists of 200+ micro-homes and RVs and supporting infrastructure, they’re at 78% of capacity already, and they are planning for another 300 homes to be built. They’re located in southeast Austin out near the Travis County Expo Center.

DCIM100MEDIADJI_0012.JPG

Aerial View of Village

And it’s really nice! The primary kind of residence are little mini-houses, 180-200 square feet in size, with electricity but no plumbing.  There are standalone bathroom buildings with individual lockable rooms. There’s kitchen buildings for more extensive cooking. There’s RVs, more expensive but better for those with medical problems. There’s a community garden (with chickens and bees), a store, a hairdresser, a garage, a forge, and more.  Heck, there’s a bus stop and an Amazon dropbox.

Here’s a series of pictures I took on our tour.

This slideshow requires JavaScript.

Austin has around 2200 homeless, and the number continues to rise. My parents visited me in Austin a couple months ago, and we went out and ate and they were shocked by how many were on the street, especially as we drove through the “shelter district” downtown. There are many efforts to help, but this is an approach I hadn’t heard of before, and wanted to share with everyone.

How Does It Work?

Donna Emery, the Director of Development for Mobile Loaves & Fishes, gave us a tour and told us all about it. She’d love any of you to come tour the village as well! Mobile Loaves & Fishes as an organization has been serving the homeless for many years, and this is their deeply considered idea at making a permanent difference.

The village isn’t a shelter; it’s intended to be permanent. They identify candidates for the village via social workers and the array of people trying to help the increasing homeless population (there’s a database they all use to track homeless clients and try to get them services and such).  The person says they want to get into the Village, and there’s an about 12 month runway program to get them ready and in.

There are three rules to living in the village.

  1. Have to pay rent. Micro-homes rent for $275-$375/month, the RVs more like $435. They work to ensure they have their social services and encourage “dignified income” working in the village or otherwise. 96% of the residents pay their rent on time, which is better than your average apartment building!
  2. Have to follow civil law. This isn’t “anything goes”, and safety is paramount. They don’t turn you away if you have a alcohol or substance abuse problem – you’re only going to get over that if you have housing – but crime isn’t allowed. It isn’t a major problem for them; homeless are generally the victims, not the perpetrators, of crimes (other than the criminalization of being homeless, of course). Applicants do have criminal background checks – they don’t disqualify you out of hand for having a record though, but don’t allow sex offenders and evaluate a past of violent crime carefully.
  3. Have to follow the rules of the community (like a strict HOA) – you have to care for your neighborhood. This isn’t a jungle, it’s a community. The place was very clean and well tended. (Pets are welcome, though! We spoke with a man walking his dog at length on our tour.)

Last year, residents earned $650k in “dignified income” – working in the gardens, crafting, doing maintenance, working in the garage and market…  You can make $900/mo from a job cleaning the community bathrooms, for example. Donna stressed that they don’t rely on handouts – it harms the dignity of the people and you don’t take care of things that are free. When a major tech company donated a bunch of tablets, they set up a monthly tablet rental.  “But those are free, we’re giving them to you, don’t make money off them,” they initially complained. But MLF explained that handouts are an unhealthy dynamic, and this way the renters respect the tablets – and themselves – more. They’ve put a lot of thought and experience into creating a place where communities and lives can grow for people that have had nothing.

Of course, they provide a lot of help, from social services to things like teaching them to use Netspend for money management.

Blue ribbon Austin business and organizations have donated a lot of the infrastructure to make this work – Alamo Drafthouse, HEB, Charles Maund, the Topfer family, and many more.

Really A Community

But the thing I found the most striking about this is that it’s really a community, and a part of the larger community around it.

40% of the residents are women. There have been two weddings so far among the residents and two residents passed away with their wishes to be interred in the Village. The average age of homeless coming there is around 50 and they’ve been chronically homeless for around 10 years. This isn’t an attempt at “give them a shower and shave and get them a job and send them back out into the wild,” this is a permanent home where they can belong as long as they want. Donna shared with us that what really makes persistent homelessness is some kind of crisis combined with a collapse of a person’s social relationships – no family, no friends to help. Being sent away from a community doesn’t tend to form better social support, does it?

From their FAQ:

It’s all about relationships. Mobile Loaves & Fishes desires to empower the community around us into a lifestyle of service with the homeless. We achieve this vision through Community First! Village by taking a relational approach for connecting with our homeless brothers and sisters, instead of a transactional approach. When we bring an individual into community with others, we truly begin to make a sustainable impact on their lives.

Mobile Loaves & Fishes believes that the single greatest cause of homelessness is a profound, catastrophic loss of family. That’s why our focus at Community First! Village is to do more than just provide adequate housing. We have developed a community with supportive services and amenities to help address an individual’s relational needs at a fraction of the cost of traditional housing initiatives. We seek to empower our residents to build relationships with others, and to experience healing and restoration as part of engaging with a broader community.

DCIM100MEDIADJI_0643.JPGThe businesses aren’t just for the residents – you can go there to the garage and pay to get your oil changed.  You can go attend their movie nights (the Alamo donated a projector) that are open to the public like any movie night in any park. They do things like a trail of lights during the holidays. There’s plenty of reasons for non-residents to go there, it’s not a “camp.” It’s just a subdivision, really, like any other one you’d drive through in Austin.

DCIM100MEDIADJI_0173.JPGHeck, you can go live there. 170 of the occupants are former homeless, but there are also many “mission families” living there with them to provide help and more strongly tie them into the social fabric of the Austin community.  Or you can rent spare homes on AirBNB!  They have a hall (“Unity Hall”) that can accommodate up to 300 and there’s a commercial kitchen attached (also staffed by residents) so you can host events there – we started seriously looking at it for smaller tech events. (More pics are in the slideshow above).

How Can You Help?

Let’s get real.  If you’re reading this tech blog you’re probably incredibly well off. Working for a company that’s incredibly well off. We have an embarrassment of riches in the tech scene here in Austin, living next to people with nothing. In DevOps we talk continually about collaboration, sharing, and community – one would think that our appetite for helping the less fortunate would go farther than just making sure you get an underrepresented person on your next tech panel.

You can help with funding.  Their Phase II capital campaign is building more homes and supporting buildings, a clinic, and more. Eventually they want things like dental care (an especially hard problem; it’s relatively expensive but dental problems unheeded turn into medical problems quickly). You can give, you can encourage your company to give. DevOpsDays Austin made spare money from sponsors, so we were able to put $25,000 into sponsoring one of the homes in their next phase.

You can help by volunteering. Persons or groups can email them and get set up to come help!  Get your church or other organization involved. They’ve had over 100 Eagle Scouts do their projects out there.

You can help by participating in your local government.  They had a long battle to be able to start the village and had to locate outside the City of Austin because of the never-ending NIMBY-ism of residents not wanting “those people” anywhere near them. Advocate for compassion and the homeless in your city council and other venues.

CFV_14_ResidentYou can help even by just going there, using the businesses, interacting with the residents to weave them into the fabric of Austin. Go on a tour to see what they’re doing out there. Bring your kids! We all had a great and deeply moving family outing in our visit to the Village.

1 Comment

Filed under Conferences, DevOps

Incident Management Course Coming!

2019-06-13 10.51.48I know we’ve been quiet on the blog, all four agile admins have been busy – several of us moved to new jobs, everyone has a lot going on.

But we’re still doing stuff!  I just went out to Carpenteria to film a LinkedIn Learning course on Incident Management.  The agile admins have a full DevOps curriculum on LinkedIn Learning (which was lynda.com); most of them are in the “Become a DevOps Engineer” learning path!  You can view them as a LIL member or they can be bought individually nowadays too.

We’ve done the 101 level (DevOps Foundation), the 201 level (CI/CD, CM/Infrastructure as Code, SRE, Monitoring and Observability, Lean and Agile) and now we’re hitting more details – Karthik’s done a bunch of Kubernetes and Cloud Native courses, Peco is doing more monitoring courses, James is doing DevSecOps courses…

2019-06-13 12.28.02And I just went and filmed an Incident Management course.  Incident Response, really, I’m hoping for a subsequent course that focuses on retrospectives (each class is only like an hour long and retros are a huge fun topic so I wanted to give them enough time on their own).

Pictured are my producers Adam and Lori and my live action director Julia (who’s also done some of my other courses!) This was a slides course (my first), but they have a program where they can add in a little live action, and since I’ve done it a bunch and Julia’s great we burned through a bunch of scripts in a short time on camera! Thanks to all of them (and my content manager Brian Anderson, not pictured).

The Course

I’ve been creating IM processes and training and leading organizations in them for a while now. A good incident response program removes friction and lets your smart technical staff focus on one thing, solving the problem, without having to worry about what to do otherwise. When I left AlienVault, the #1 thing people came and said to me in my 2 week notice period was “Hey, that incident management process, that’s really made a huge difference,” which is great to hear.

And it was a good opportunity to refresh on the newer developments in the field.  I first got into modern IM, which I defines as “derived from the Incident Command System”, in 2008 after I heard Brent Chapman speak at Velocity on Incident Command for IT: What We Can Learn from the Fire Department.  But (aside from retros) while that concept spread, for 5-6 years there wasn’t really a lot more in terms of new developments. Luckily that’s changed, and there’s been a lot lately. John Allspaw and J. Paul Reed have both done masters’ theses with Lund University’s Division of Risk Management and Societal Safety; there’s a new O’Reilly book Incident Management for Operations as well as IM being a hot topic in the Google SRE books, and so on. The REdeploy conference and Thai Wood’s Resilience Roundup weekly email newsletter and the Oncall Nightmares podcast re full of late breaking developments. (These sources and more are listed in the course handout!)

Special thanks to J. Paul for giving me guidance on the course content and giving me permission to use his and Kevina Finn-Braun’s Incident Lifecycle Model in it.

Expect video topics like:

  • Why Do I Need Incident Management?
  • The Incident Command System
  • Scoping the Problem
  • Your Incident Toolchain
  • Incident Toolchain Example
  • Detecting and Reporting Incidents
  • First Response and Escalation
  • Incident Communication With Your Users
  • Communicating Inside Your Organization
  • Best Practices for Diagnosis and Repair
  • Cleaning Up After
  • Continuously Improving
  • Training and Game Days
  • Implementation Challenges

Oh, and I got to use props for the first time (like that fire extinguisher in the lead pic), we threw some in for kicks. Fun!

The Experience

Speaking of that, I just wanted to give the LinkedIn Learning team a shout-out.  Making courses with them is a great experience, class all the way.  They are all super skilled at what they do and super friendly. Going to their campus/studio in Carpenteria, CA is always an exceedingly pleasant experience. Everything’s top notch, sound booths, live action studios… It’s not the average webcam tech course when you’re looking down the barrel of a camera with a director, a producer, and a sound/teleprompter person fussing over the fine details! If you are an expert in something (not just tech) and are interested in doing courses, I’m happy to introduce you to someone there; it’s all top quality.

And they treat their people well there!  As best as I can tell they always have, from when they were Lynda to when they were LinkedIn to now being owned by Microsoft. Lori confided in me, “I was a documentary filmmaker with a non-profit for years and I didn’t know jobs like this existed; I’ve never been treated so well.”

While I was there they were doing their monthly “InDay”, and apparently this is the most anticipated one of a year as it’s game themed. They had inflatable human foozball, arcade games, did up the cafeteria with a Stranger Things theme, even had a D&D training session.

 

2019-06-13 17.33.21And of course Carpinteria is beautiful, right on the beach, extremely temperate. It’s between Ventura and Santa Barbara, just north of LA. If you go out there, my hot tips are the nearby Shoals restaurant (a little down the 101) where you can get a table right on the water, and Chocolats du CaliBressan, a French chocolatier down in the far north end the beach side of Carpinteria. Oh and the booze is super cheap in the supermarket, so we always make some gin and juice and hang out in the Holiday Inn’s hot tub while we’re there…

 

2 Comments

Filed under DevOps