Author Archives: Ernest Mueller

Ernest Mueller's avatar

About Ernest Mueller

Ernest is the VP of Engineering at the cloud and DevOps consulting firm Nextira in Austin, TX. More...

DORA AI Coding Report Breakdown

There’s a new DORA report out from Google, but it’s not the usual DevOps one we’ve come to expect – this one is entirely focused on the state of AI-assisted software development.

That’s not too surprising, straight up DevOps is last decade’s news – Gene Kim rebranded the DevOps Enterprise Summit and is publishing vibe coding books, the DevOps OGs like Patrick Debois and John Willis have been focusing on AI building, and so it makes sense that the DORA crew are also poking in that direction.

A lot of the shift in DevOps in recent years has been towards focusing on developer productivity. Whether that’s the rise of platforms to take burden and complexity away from devs, to Nicole Forsgren’s new SPACE metrics that extended her previous Accelerate/DORA metrics that were focused just on software delivery, everyone is keenly aware that unlocking the developers’ ability to create is important.

Companies I work with are really prioritizing that. At ServiceNow, they got Windsurf licenses for all and report a 10% productivity boost from it. And just “we have some AI” isn’t enough, Meta just cut one of their major AI teams because they had “gotten too bureaucratic” and slow so they wanted to move people to a newer team where they could get more done. So companies are taking developer productivity very seriously and spending real money and making big changes to get it.

Understanding Your Software Delivery Performance

As you read the report, you’ll notice that large chunks of it are NOT about AI directly. This first chapter, for example, recaps the important areas from previous DORA reports. It talks about metrics for software delivery and characterizes kinds of teams you see in the wild and their clusters of function and dysfunction. You don’t really get to AI till page 23.

Is this “AI-washing”? If so, it’s justified. People want “AI” to be the solution when they don’t understand their problem, or how to measure whether their problem is solved – AI can help with software engineering and DevOps but it does nothing to change the fundamental nature of any of it, so if you don’t understand the non-AI basics, if you’re handed AI to loose on your company you may as well be an armed toddler.

AI Adoption and Use

The report has good stats that dig deeper than news reports – while 90% of people are “using AI”, in general they use it maybe 1-2 hours out of their day and don’t go to it first all the time.

The thing I found the most surprising was what people were using it for. In my experience folks are using AI for the lighter work more often than actually writing code, but their research showed writing code was by far the most common use case (60%) and stuff like internal communication the least common task (48%) (outside calendar management at 25%, but the tools for that are terrible IMO).

Chatbots and IDEs are the vast majority of how people interact with AI still, integrated tool platforms only have 18% traction.

People do in general believe they’re being more productive from using AI, by a wide margin, and also believe their code quality has gone up! Pure vibe coding makes terrible quality code, I believe this is because how real coders are using AI is more thoughtful than just “write this for me.” And this is borne out in their trust metrics – most people do NOT trust AI output. 76% of respondents trust AI somewhat, a little, or not at all – despite 84% believing it has increased their productivity.

I think that’s super healthy – you should not trust AI output, but if you keep that in mind, it lets you use it and be more productive. You just have to double check and not expect magic. Consider that ServiceNow article I linked above about their Windsurf adoption, it’s not reastic to think AI is going to give you orders of magnitude of coding productivity increase – 10% is great though, more of an improvement than most other things you can do!

AI and Key Outcomes

That leads us into the meatier portion of the report, which is taking the research past “what people think” and trying to correlate real outcomes to these factors. Which is a little ticky, because developer morale is a part of what contributes to delivery and there may be a “placebo factor” where believing AI tools are making you better, makes you better whether or not the tool is contributing!

What they found is that while AI use does really improve individual effectiveness, code quality, and valuable work, it doesn’t help with friction and burnout, and has a significant negative effect on software delivery instability.

So what do we make of increased software delivery instability when we think we’re generating more and better code? And we think the org performance is still doing better? The report doesn’t know either.

My theory is similar to the answer to “why doesn’t everyone run multi-region systems when AWS us-east goes down from time to time?” Just to refresh you on the answer to that one, “it’s more expensive to do it right than to have an outage from time to time.” If you can cram more code down the pipe, you get more changes and therefore more instability. But just like companies gave up on shipping bug-free code long ago, some degree of failure with the tradeoff of shipping more stuff is a net financial win.

AI Capabilities Model

The reason I love DORA is they go deep and try to establish correlation of AI adoption best practices to outcomes. At page 49 is their big new framework for analysis of AI impact on an org. Here’s what they have so far on how specific practices correlate to specific outcomes, with caveats that it’ll take another year of data to know for sure (though AI innovation cycles are month by month, I hope they’ll find a way to get more data more quickly than a yearly cadence).

Platform Engineering

The report then takes another turn back to earlier DORA topics and talks about platform engineering, the benefits, and how to not suck at it.

For those who are unclear on that, you get wins from a platform that is user centric. So many organizations don’t – or deliberately mis- – understand that. You could call all the old centralized IT solutions from previous decades a “platform” – Tivoli, HP WhateverCenter, and so on – but they were universally hateful and got in the way of progress in the name of optimizing the work of some commodity team behind a ticket barrier. (I’ll be honest, there’s a lot of that at my current employer.)

I’m going to go a step farther than the report – if you don’t have a product manager guidlign your platform based on its end users’ needs, your platform is not really a platform, it’s a terrible efficiency play that is penny wise but pound foolish. Fight me.

Anyway, they then say “platforms, you know, it’s the place you can plug in AI.” Which is fine but a little basic.

Value Stream Management

Is important. The premise here is that given the basic premise of value flow (if you don’t know about lean and value streams and stuff, I’ve got a LinkedIn Learning course for you: DevOps Foundations: Lean and Agile), systems thinking dictates that if you accelerate pieces in your workflow you can actually harm your overall throughput, so major changes mean you need to revisit the overall value stream to make sure it’s still the right flow, and measure so you understand how speeding up pieces (like oh say making code) affects other pieces (like oh say release stability).

They find that AI adoption gets you a lot more net benefit in organizations that understand and engineer their value stream.

The AI Mirror

This section tries to address the mix of benefits and detriments we’ve already talked about with AI. It basically just says hey, rethink how you do stuff and see if you can use AI in a more targeted way to improve the bad pieces, so for software delivery try using it more for code reviews and in your delivery pipelines. It’s fine but pretty handwavey.

That’s understandable, I don’t think anyone’s meaningfully figured out how to bring AI to bear on the post-code writing part of the software delivery pipeline. There’s a bunch of hopefuls in this space but everything I’ve kicked the tires on seems still pretty sketch.

Metrics Frameworks

You need metrics to figure out if what you’re doing is helping or not. They mention frameworks like SPACE, DevEx, HEART, and DORA’s software delivery metrics, and note that you should be looking at developer experience, product excellence, and organizational effectiveness. “Does AI change this?” Maybe, probably not as much as you think.

And that’s the end at page 96, there’s 50 pages of credits and references and data and methodology if you want to get into it.

Those last 4 chapters feel more like an appendix, they don’t really flow with the rest of the report. The AI methodology talks about things to do specifically boost your AI capabilities (Clear and communicated AI stance… Working in small batches) which somewhat overlap (Quality internal platforms, User-centric focus) with these later chapters but to a degree don’t. If value stream management is shown to improve your AI outcomes then – why’s it not in the capability model?

I assume the answer is, to a degree, “Hey man this is a work in progress” which is fair enough.

Conclusion

I find two major benefits from reports like this, and judge their success based on how well they achieve them.

  1. Showing clear benefits of something, so you can use it to influence others to adopt it. This report does very well there. One of my complaints about the DORA reports is that in recent years they’d become more about the “next big thing” than about demonstrating the clear benefits of core DevOps practices, so I’d often go back and refer to older reports instead of the newer ones. But here – are people getting benefit from AI? Yes, and here’s what, and here’s what not. Very cleaar and well supported.
  2. Telling you how to best go about doing something, so you can adopt it more effectively. The report also does well here, with the caveat of “so much of this is still emerging and moving at hyperspeed that it’s hard to know.” They’ve identified practices within AI adoption and in the larger organization that are correlated to better outcomes, and that’s great.

And I do like the mix of old and new in this report. You have to wave the new shiny at people to get them to pay attention, but in the end there are core truths about running a company and a technology organization within a company – value streams, metrics, developer experience, release cadence and quality – that AI or any new silver bullet may change the implementation of, but does not change fundamentally, and it’s a good reminder that adopting sound business basics is the best way to take advantage of any new opportunity, in this case AI.

TL;DR – Good report, use it to learn how people are benefitting from AI and to understand specific things you can do to make your organization benefit the most from it!

Leave a comment

Filed under AI, DevOps

The AI Ate My Homework

The evolution of responsibility shifting over the millenia culminating in "the AI says" today.
Image made by AI! Oh, the irony!

Boy, that AI, it’s to blame for so many things isn’t it!

  • AI causes layoffs!
  • AI causes bad journalism!
  • AI causes headaches for recruiters!
  • AI causes ethics violations!

But here’s the truth, AI doesn’t cause any of this – people do.

All these things are human decisions. “AI” didn’t cause that layoff or any of these other things. It’s a handy technology, and just like the PC or mobile, it allows us to disrupt and change things, but it’s not “causing” anything.

And to be honest, most “AI-caused layoffs” are a barefaced lie. Just like most RTO mandates are clandestine layoffs, CEOs have realized that you can now have a layoff or hiring freeze and say “because of our use of AI” and your stock goes up instead of down. So that’s what they say. Speaking as a technology consultant whose team is involved in a lot of large AI implementations, except for large call center or “digital piecework” shops, no one is really using AI enough already that it’s truly laid off large numbers of staff, and certainly all developer layoffs attributed to AI so far are layoffs they just wanted to do regardless.

Every decade has its new technology silver bullet. In the after-times, we remind ourselves that there is no such thing as a silver bullet and it’s always more complicated than that. Then we forget and fall in love with the newest silver bullet, and the shinest teflon-coated hollow-point silver bullet is AI. But that’s just tech business as usual.

The core problem here is that personification of AI is very deliberately being used to shift blame and pass the buck.

Remember when Mark Zuckerberg went before Congress and kept trying to blame “the algorithm” for fanning political extremism, as if it wasn’t just software his company had built? This is the same playbook, scaled out, substituting in “AI” as the boogeyman.

Whenever any AI exec talks about AGI or “we the tech oligarchs are also scared of AI, it may be alive and coming to get us!” they are very deliberately trying to play a shell game of brandishing something that you all can identify as a blame-bearing entity in front of them so that they, their company, and their practices can have plausible deniability. No no, it’s “the AI” that stole all of your IP, or told you to drive your car off a cliff, not us. But the only reason you think it’s different than “a person at that company told you to do that” is the use of disinformation, wealth, and power to snooker you.

In many use cases, AI isn’t better, it’s just cheaper, and it’s only cheaper right now when trillions of dollars are being poured into defraying its cost. No one wants an AI drive-through – it’s slower and harder to fix mistakes. Companies only pitch it as the ‘better option’ when the alternative is an underpaid, burned-out worker. That’s a false choice.

This false choice is set up for you by companies that don’t care about your experience but want to save money. They’re the ones who only want to pay minimum wage, or have your support phone calls go to another country, or whatnot, and have already done that at the expense of you as a customer. The real option, doing it well in the first place, is always on the table. Don’t let them play you like a toddler and ask you “which of these two shirts you want to wear today.”

Over the millennia, people have always found a scapegoat for exploitation. First it was “the gods,” then “the laws,” then “the algorithm,” and now it’s “the AI.” Always keep in mind that all of these are only masks for one person – The Man. It’s not the mask taking the actions, it’s the people hiding behind them. Don’t get suckered into the shell game. Hold the real decision-makers accountable.

Demand better. Don’t be decieved into framing decisions and outcomes as being “caused by” AI. Journalists are the *worst*, they gave up on real reporting already in favor of “what someone said on Twitter,” and they are now happily replacing that with “ChatGPT says”, as if that’s thoughtful analysis.

Leave a comment

Filed under AI

OpenAI vs Opus – the Penguin

How does OpenAI compare versus a penguin in the role of a legal secretary? Let’s see…

import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

def get_completion(prompt, model="gpt-3.5-turbo", temperature=0): 
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
    )
    return response.choices[0].message["content"]

prompt = f"""
Translate the following letter from a legal firm to a delinquent client, Fred 'The Cincinnati Strangler' Johnson, from hostile slang to a friendly business letter: 
'Attention criminal pervert: Where the hell is my dough, you twisted goon?  Don't forget who got you sprung on that technicality.  I want the money.  Now.  P.S. Have you strangled your wife yet, psycho-brain?'
"""
response = get_completion(prompt)
print(response)
Dear Mr. Johnson,

We are writing to follow up on the outstanding balance on your account. As your legal representatives, we were able to assist you in your legal matters and we kindly request that you fulfill your financial obligations to us promptly.

Please be advised that we require payment as soon as possible. We appreciate your attention to this matter and look forward to receiving your payment promptly.

Thank you for your cooperation.

Sincerely,

I still have to give it to Opus by a flipper, mainly for the P.S., but OpenAI got into the right general headspace! And it’s too hot here in Texas for me to keep a penguin around to translate what I want to say into more acceptable terms, and I sure get tired of doing it.

I miss my old Eudora email client that would put little hot peppers next to my email if it thought it was too spicy…

1 Comment

Filed under AI, General

What Is A Blocker?

A recent project delay at work put me in mind of this recurring issue I’ve seen with a lot of agile teams. That is, reluctance to call something a blocker. Karthik and I are working on a new revision of our LinkedIn Learning courser “DevOps Foundations: Lean and Agile” so I decided to dig into this a bit.

I think many engineers believe a blocker is “something that prevents me from doing any work whatsoever on this entire project.” This in my experience leads to a lot of project delays and unaddressed issues because something was not identified and communicated widely enough to be swiftly resolved.

This is unfortunately encouraged by some Agile wonks who start hairsplitting with terms. “Well, there’s a blocker and then there’s an impediment,” they say. As I google this, it turns out you can differentiate between “delays, impediments, blockers, and roadblocks. Oh, and dependencies.” And boards, reports, etc. have to do/in progress/done and blockers, not 10 other categories.

Here’s the deal. Most teams out there are not formally trained on PM or agile and have essentially figured out what they know via osmosis. And there’s one term they even vaguely understand, which is “blocker”. (I have never seen a team distinguish formally between blockers and impediments in decades of doing this.).

While I love wordplay as much as the next person, I don’t think this attempt to categorize bad things on the infinite spectrum of bad things is practical, and best belongs as an organic explanation of the impact. Does it prevent you from proceeding on that piece of work? On any work? You can proceed on it but it can’t become done until the thing is resolved? Or it doesn’t technically stop work but it does put the project a week behind? Sure, say it. It’s important to know the impact but it does not change the nature of the existence of an issue and the need to swarm on or escalate it.

The practical definition of a blocker from a team member level is “anything not entirely in my control that is stopping or delaying work now or in the very near future.”

The practical definition of a blocker from a management point of view is “anything that is getting in the way of the team that I need to know about or do something about.”

We have to go back to the entire reason to have a term like “blocker”, which is to allow the team, or failing that their management escalation, to resolve issues that prevent the continued timely flow of work. Period, end of story, if process definition hairsplitting isn’t serving that core goal then do what does.

Definition people love to say something’s not “technically a blocker – yet.” “Well not having a cloud accout to use as the required test fixture isn’t technically a blocker because I won’t need it for another two days, even though there’s been no obvious headway on the request we made to IT for it.” Can anyone seriously contend that’s not a blocker? It’s a problem that is clearly visible on the road ahead, you don’t have to run into it first like my cheap Roomba does in order to escalate it, and doing so is antithetical to the overall agile goal of ensuring smooth and continuous flow. I don’t tolerate “technically true” when it becomes “wilfully dumb.”

Underlying this seems to be some unstated assumption that blockers are “bad” and you are bad for having one or reporting one. And I get it, there’s plenty of bad scrum masters/managers/etc. out there that operate unthinkingly on some Neanderthal level and react as “person say thing I don’t like, person is bad.” (Or the modern tech bro Neanderthal who has some variation of this like “well I need to discourage people from reporting blockers to make them use their masculine energy to pull themselves up by their bootstraps blah blah.”) Sure, toxic people can drive any process off track, that shouldn’t be the default however.

I believe in a healthy Agile environment team members should be encouraged to bring up anything threatening to slow or stop work. It can be a small thing, but it creates an opening for help from the team. Even “I haven’t used this tool before and it’s taking a little longer and I am not sure I’ll get this task done by sprint end” – that’s an opportunity for someone to hop on for a half hour to pair with you or train you.

If you’re off schedule there’s some blocker around, whether it can be handled in the team or needs escalation. You don’t have to escalate everything, though even if the team handled it, schedule or other impacts need to be communicated. For escalations, make it clear it’s an escalation and who to, don’t just assume everyone who gets a status report will seize on all the blocker lines as to dos. “Blocker: No headway on Azure text fixture, it’s needed to complete our work this sprint and will delay us if it’s not in place in 2 days – @ernest we need your help with this one” is perfect.

As someone providing oversight for a lot of sprint teams working on consulting engagements often with client-prescribed milestone deliverables, I keep getting into situations where a sprint full of reporting “no blockers” suddenly turns into “well but of course we won’t have any of the deliverables at sprint end tomorrow.” That makes everyone unhappy, especially me if it’s something I could have urged the client or an external team to provide for the team more promptly. Give people the opportunity to intervene to keep you on track!

I’m going to start adding “definition of blocker” right after “definition of done” in kickoff discussions because of how chronic this issue is – I venture to say I’ve seen it everywhere, it’s just more tolerated in environments where schedules aren’t taken too seriously.

Let me know how you handle this issue, if you encourage a wide definition of blocker, and your experiences on this!

Leave a comment

Filed under Agile, Management

The Right Way To Use Tagging In The Cloud

This came up today at work and I realized that over my now-decades of cloud engineering, I have developed a very specific way of using tags that sets both infra dev teams and SRE teams up for success, and I wanted to share it.

Who cares about tags? I do. They are the only persistent source of information you can trust (as much as you can trust anything in this fallen world) to communicate information about an infrastructure asset beyond what the cloud or virtualization fabric it’s running in knows. You may have a terraform state, you may have a database or etcd or something that knows what things are – but those systems can go down or get corrupted. Tags are the one thing that if someone can see the infrastructure – via console or CLI or API or integrated tool – that they can always see. Server names are notoriously unreliable – ideally in a modern infrastructure you don’t reuse servers from one task to another or put multiple workloads on one, but that’s a historical practice that pops up all to often, and server names have character limits (even if they don’t, the management systems around them usually enforce one).

Many powerful tools like Datadog work by exclusively relying on tags. It simplifies operation and prevents errors if, when you add a new production app server, that automatically gets pulled into the right monitoring dashboards and alerting schemes because it is tagged right.

I’ve run very large complex cloud environments using this scheme as the primary means to drive operations.

Top level tag rules:

  1. Tag everything. Tagging’s not just for servers. Every cloud element that can take a tag, tag. Network, disk images, snapshots, lambdas, cloud services, weird little cloud widgets (“S3 VPC endpoint!”).
  2. Use uniform tags. It’s best to specify “all lower case, no spaces” and so on. If people decide to word a tag slightly differently in two places, the value is lost. Both the key and the value, but especially the key – teach people that if you say “owner” that means “owner” not “Owner” and “owning party” and whatever else.
  3. Don’t overtag with attributes you can easily see. Instance size, what AZ it’s in, and so on is already part of the cloud metadata so it’s inefficient to add tags for it.
  4. Use standard tags. This is what I’ll cover in the rest of this article.

At the risk of oversimplifying, you need two things out of your systems environment – compliance and management. And tags are a great way to get it.

Compliance

Attribution! Cost! Security! You need to know where infrastructure came from, who owns it, who’s paying for it, and if it’s even supposed to be there in the first place.

Who owns it?

Tag all cloud assets with an owner (email address) basically whatever is required to uniquely identify who owns an asset. Should be a team email for persistent assets, if it’s a personal email then the assumption should be if that person leaves the company those assets get deleted (good for sandboxes etc). 

The amount of highly paid engineer time I’ve seen wasted over the last decade of people having to go out and do cattle calls of “Hey who owns these… we need to turn some off for cost or patch them for security or explain them for compliance… No really, who owns these…” is shocking.

owner:myteam@mycompany.com

Who’s paying for it

This varies but it’s important. “Owner” might not be sufficient in an environment – often some kind of cost allocation code is required based on how your company does finances. Is it a centralized expense or does it get allocated to a client? Is it a production or development expense, those are often handled differently from a finance perspective. At scale you may need a several-parter – in my current consulting job there’s a contract number but also a specific cost code inside that contract number that we need all expenses divvied up between.

billing:CUCT30001

Where did it come from

Traceability both “up” and “down” the chain. When you go look at a random cloud instance, even if you know who it belongs to you can’t tell how it got there. Was it created by Terraform? If so where’s the state file? Was it created via some other automation system you have? Github? Rundeck? Custom python app #25?

Some tools like Cloudformation do this automatically. Otherwise, consider adding a source tag or set of tags with sufficient information to trace the live system back to the automation. Developers love tagging git commits and branches with versions and JIRA tickets and release dates and such, same concept applies here. Different things make sense depending on your tech stack – if you GitOps everything then the source might be a specific build, or you want to say which s3 bucket your tfstate is in… Here as an example, I’m working with a system that is terraform instantiated from a gitops pipeline so I’ve made a source tag that says github and then the repo name and then the action name. And for the tfstate I have it saved in an s3 bucket named “mystatebucket.”

source:github/myapp/deploy-action
sourcestate:s3/mystatebucket

When does it go

OK, I know the last two sound like the lyrics to “Cotton-Eyed Joe”, which is a bonus. But a major source of cost creep is infrastructure that was intended to be there for a short time – a demo, a dev cycle – that ends up just living forever. And sure, you can just send nag-o-grams to the owner list, but it’s better to tag systems with an expires tag in date format (ideally YYYY-MM-DD-HH-MM as God intended). “expires:never” is acceptable for production infrastructure, though I’ve even used it on autoscaling prod infrastructure to make sure systems get turned over and don’t live too long.

expires:2025-02-01-00-00-00
or
expires:never

Management

Operations! Incidents! Cost and security again! Keep the entire operational cycle, including “during bad production incidents”, in mind when designing tags. People tear down stacks/clusters, or go into the console and “kill servers”, and accidentally leave other infrastructure – you need to be able to identify and clean up orphaned assets. Hackers get your AWS key and spin up a huge volume of bitcoin miners. Identifying and actioning on infrastructure accurately and efficiently is the goal.

As in any healthy system, the “compliance” tags above aren’t just useful to the beancounters, they’re helpful to you as a cloud engineer or SRE. But beyond that, you want a taxonomy of your systems to use to manage them by grouping operations, monitoring, and so on.

This scheme may differ based on your system’s needs, but I’ve found a general formula that fits in most cases I come across. Again, it assumes virtual systems where servers have one purpose – that’s modern best practice. “Sharing is the devil.”

EARFI

I like to pronounce this “errr-feee.” It’s a hierarchy to group your systems.

  • environment – What environment does this represent to you, e.g. dev, test, production, as this is usually the primary element of concern to an operator. “environment:uat” vs “environment:prod”.
  • application – What application or system is this hosting? The online banking app? The reporting system? The security monitoring server? The mobile game backend? GenAI training? “application:banking”.
  • role – What function does this specific server perform? Webserver dbserver, appserver, kafka – systems in an identical role should have identical loadouts. “role:apiserver” vs “role:dbserver”. Keep in mind this is a hierarchy and you won’t have guaranteed uniqueness across it – for example, “application:banking,role:dbserver” may be quite different from “application:mobilegame,role:dbserver” so you would usually never refer to just “role:dbserver.”
  • flavor – Optional, but useful in case you need to differentiate something special in your org that is a primary lever of operation (Windows vs Linux?  CPU vs GPU nodes in the same k8s cluster? v2 vs v2?). I usually find there’s only one of these (besides of course region and things you shouldn’t tag because they are in other metadata). For our apiserver example, consider that maybe we have the same code running on all our api servers but via load balancer we send REST queries to one set and SOAP queries to another set for caching and performance reasons. “flavor:rest” vs “flavor:soap”.
  • instance – A unique identifier among identical boxes in a specific EARF set, most commonly just an integer. “instance:2”. You could use a GUID if you really need it but that’s a pain to type for an operator.

This then allows you to target specific groups of your infrastructure, down to a single element or up to entire products.  

  • “Run this week’s security patches on all the environment:uat, application:banking, role:apiserver, flavor:rest servers.” Once you verify, you can do the same on environment:prod.”
  • “The second of the three servers in that autoscaling group is locked up. Terminate environment:uat, application:banking, role:apiserver, flavor:rest, instance:2
  • “We seem to be having memory problems on the apiservers. Is it one or all of the boxes? Check the average of environment:prod, application:banking, role:apiserver, flavor:rest and then also show it broken down by instance tag. It’s high on just some of the servers but not all? Try flavor:rest vs flavor:soap to see if it’s dependent on that functionality. Is it load do you think? Compare to the aggregate of environment:uat to see if it’s the same in an idle system.”
  • “Set up an alert for any environment:prod server that goes down. And one for any environment:prod, application:banking, role:apiserver that throws 500 errors.”
  • “Security demands we check all our DB servers for a new vulnerability. Try sending this curl payload to all role:dbservers, doesn’t matter what application. They say it won’t hurt anything but do it to environment:uat before environment:prod for safety.”

So now a random new operator gets an alert about a system outage and logs into the AWS console and sees not just “i-123456 started 2 days ago,” they see

owner:myteam@mycompany.com
billing:CUCT30001
source:github/myapp/deploy-action
sourcestate:s3/mystatebucket

expires:never
environment:prod

application:mobilegame
role:dbserver
flavor:read-only
instance:2


That operator now has a huge amount of information to contextualize their work, that at best they’d have to go look up in docs or systems and at worst they’d have to just start serially spamming. They know who owns it, what generates it, what it does and has hints at how important it is. (prod – probably important. A duplicate read secondary – could be worse.) And then runbooks can be very crisp about what to do in what situation by also using the tags. “If the server is environment:prod then you must initiate an incident <here>… If the server is a role:dbserver and a role:read-only it is OK to terminate it and bring up a new one but then you have to go run runbook <X> and run job <y> to set it up as a read secondary…”

Feel free and let me know how you use tags and what you can’t live without!

Leave a comment

Filed under Cloud, DevOps, Monitoring

Career Tip – Managing Up

“What is my manager’s deal, anyway?”

Here’s some career advice that can help you build a more effective relationship with your manager. Remember, they may be a manager but they don’t know everything, or everything that you do, and they are navigating work and life with just as much trepidation as you are! If you haven’t been a manager, it’s sometimes hard to understand why they’re doing what they are and how to best work with them to make both of you happy. So you want to figure out how to “hack” your manager by managing up!

For many years I treated my managers as random-weird-request generators, and frequently worked at cross-purposes with them. until I got advice on managing up and it helped my career.

Managing up, or managing your manager, is an important skill that can contribute to a more productive and positive work environment. Here are some key pieces of advice to effectively manage up:

  1. Understand your manager’s priorities and expectations: Take the time to understand your manager’s goals, preferred communication style, and expectations. Ask them if it’s not obvious! This knowledge will help you align your work and approach accordingly, or at least find a happy medium. (Feel free and tell them the same about you!) Managers usually have a very specific reason for why they’re asking for something and why they are stressing the things they’re stressing; understanding why is the key to understanding them.
  2. Build a strong relationship: Develop a positive and professional relationship with your manager. Be proactive in seeking feedback, understanding their working style, and demonstrating your commitment to achieving shared goals. Our managers try to share the context of what needs to happen with everyone so that they can go do it with autonomy, so reflecting your understanding of and commitment to what’s going on at a high level helps them empower you. If you can help them achieve their goals via a plan you put together, it prevents them needing to “micromanage” by also dictating how to get there.
  3. Communication is key: Maintain open and regular communication with your manager. Keep them informed about your progress, challenges, and any important updates. Be clear, concise, and respectful in your communication, and adapt your style to match your manager’s preferences – remember they have a bunch of people they are trying to wrangle to understand the state of a lot of projects.
  4. Anticipate needs and be proactive: Try to anticipate your manager’s needs and take proactive steps to address them. Take initiative, suggest solutions to problems, and offer assistance when appropriate. Show that you are capable of working independently and taking ownership of your responsibilities.
  5. Make clear asks: Your manager is there to get you what you need to do your job and be happy and healthy. But everyone is different. They don’t know how you prefer to get recognized, or what kind of projects you want to work on, or resources you think you need to be successful… So tell them! They should be trying to figure it out by asking you too, but “communication is hard” and people often make assumptions based on a given situation or communication that may or may not reflect your needs.
  6. Provide solutions, not just problems: When you encounter challenges or issues, avoid simply presenting the problems to your manager. Instead, propose potential solutions or alternatives. This demonstrates your critical thinking and problem-solving skills, and it lightens the burden on your manager by offering actionable suggestions. If you don’t have a good solution to a specific issue it’s fine, but sometimes a manager can become dismissive of someone who “just complains all the time” because it adds work to a limited time without any help.
  7. Seek and act on feedback: Actively seek feedback from your manager on your performance and areas for improvement. Be open to constructive criticism and use it to grow and develop professionally. Demonstrate your willingness to learn and make necessary adjustments based on the feedback received.
  8. Manage your time effectively: Prioritize your tasks, set clear goals, and manage your time efficiently. This will help you meet deadlines, deliver quality work, and reduce the need for constant supervision. Ask if priorities or timings aren’t clear. Your manager dearly wants everyone to be able to do their own thing without any intervention but is held responsible by upper management for outcomes and project schedules/profitability.
  9. Be a team player: Collaborate and foster positive relationships with your colleagues. Support your teammates, share knowledge, and contribute to a cooperative and harmonious work environment. Show that you can work well with others and contribute to the overall success of the team.

Managing up is not about manipulation or trying to control your manager. It’s about building a strong working relationship based on trust, effective communication, and mutual respect. By demonstrating your competence, reliability, and commitment, you can effectively manage up, have the trust and proactive support of your manager, and contribute to your professional growth and success.

(This article partially written by ChatGPT!)

Leave a comment

Filed under Management

The Value Of Gratefulness


I have been in technology management for more than 20 years now and have worked in a wide variety of shops, and I think I’ve identified a key element that creates a good leader, and that is gratefulness.

Gratefulness Empowers Recognition

Everyone knows that “employee recognition” is important for morale; any company cites it as a priority whether they are really doing it or not. Sometimes it just gets forgotten – but sometimes there’s excuses given not to do it, concerns that it “sounds artifical” or that “they get thanks in form of their salary” or “people will be uncomfortable or jealous.” And some people honestly have a hard time doing it.

I’ve found that those that cultivate an actual spirit of gratefulness within them for other peoples’ work, especially for those who work for you and the sweat of their brow contributes to your success and growth, have an easier time of it.

This shouldn’t be a surprise. The classic Dale Carnegie book How To Win Friends and Influence People is often categorized as a “sales book.” It’s not, it’s way more profound than that and deserves a place in any leader’s library. In its introduction there’s explicitly a story of a man with 314 employees who did nothing but criticize them, then studied the book’s principles, and subsequently turned around his management strategy so he had 314 friends and not 314 enemies, leading to both increased happiness and increased profitability. And Part 2 of the book quickly gets to the “how” – it starts with “Become genuinely interested in other people” and ends with “Make the other person feel important – and do it sincerely.”

I don’t think it’s a shocking revelation that gratefulness leads to better recognition and therefor to better morale, but what I want to get across here is that even if you’re not good at that out of the gate, it can be learned.

And once you learn it, you get more help from other people.

I personally grew up as a very introverted person who was happy alone and on the computer, and not being very interested in others. But in my early career I quickly saw that was holding me back. I wanted to change it so I read How To Win Friends and Influence People and tried to put it into action. Awkwarly and self-consciously at first, of course.

Then something strange started happening to me. People I didn’t know would turn and talk to me in the elevator! I was, frankly, shocked. Generally in my life up to that point, in public I left people alone and they left me alone. I came to the realization that even my demeanor had changed and was more open somehow, and it was causing people I didn’t even know and wasn’t intending to interact with to feel like they could interact with me. And not to hassle me, but to help me.

Ungratefulness Leads To Bad Decisions

For many years I thought that gratefulness was just something that made you friendlier and made recognition easier and so was good in the long term. But then I worked at a startup where the CEO had a deep, fundamental lack of gratefulness, and I saw how that leads to critically bad decisonmaking.

Because people, and people’s work, have value – not in some hug-filled hippie sense, but in a very tangible sense. At the company in question the CEO came to me several times wanting to fire an engineer who had legit written 80% of the working product code in the shop “because he doesn’t think architect level.” He ousted a co-founder who was the only person who had actually brought in sales for the company. So years later it was a startup that had trouble even creating a shipping product and certainly wasn’t growing revenue, and had – seriously estimating – about 300% employee turnover in its lifetime. He sabotaged his own company because he couldn’t look at even objective value creation (working code! shipping product! sales revenue!) and value those who generate it at all.

That really made me stop and think. The stereotype of the ungrateful leader is one that only values hard objective results and “is mean” to people otherwise, but my experience has led me to the conclusion that’s a false dichotomy – if you are unable to see value you’re going to be unable to see it whether it’s in a person or in github or on a ledger book. Especially in a sector where that value is being created by the skilled workers!

Instead, you want to train yourself to see value so that you can gather more of it and help it grow! It’s not just being a kind leader because that’s “in” this decade, gratefulness is actually a strength you can develop that helps you make effective decisions.

Leave a comment

Filed under Management

How To Learn DevOps?

Hey all! James and I are preparing to revise our LinkedIn Learning course, DevOps Foundations, a three hour set of videos designed to orient beginners in the whole scope of DevOps. 

We created the course in 2016 primarily because at the time there were no good introductions to DevOps. You needed to know what blogs to follow and what events to go to and that was it. Even the DevOps Handbook hadn’t come out yet. And this provided a very high barrier to entry to the field. And we believe in learning and collaboration so we knew what we had to do!

Since then, it’s been one of the top tech courses on LinkedIn Learning with over 400,000 learners so far and has generated a dozen other courses drilling down into detail in specific areas. The things that make it worth it to me is the people we run across who say “this helped me improve my career.” My favorite was one gentleman who pulled me aside at the Aqua Security booth at RSA back before the pandemic and said “Hey, I had just gotten out of the Army and was trying to get a good job, and so was looking at tech. Your course oriented me enough that I got a sales job here!” Being able to help people like that is a rare privilege and we really value it.

Please fill out our survey to let us know what you think are the key things someone needs to learn about DevOps – whether they have some existing dev or ops knowledge or are just getting into it!

Here’s the old table of contents for reference… A lot of this hasn’t changed, the basics are still the basics, but it has been 7 years and a lot has changed, some things to add, some things to change, some things to cut. Let us know your opinion!

  • DevOps Basics
    • What Is DevOps? – Understand the meaning of DevOps and why you might care about it.
    • DevOps Core Values: CAMS – Culture, Automation, Measurement, and Sharing are the core values of DevOps.
    • DevOps Principles: The Three Ways – The Three Ways can guide your strategic approach to DevOps.
    • Your DevOps Playbook – There’s a developing list of patterns and methodologies that can help you transition to DevOps.
    • Ten Practices for DevOps Success: 10 through 6 – Tactical, pragmatic tips for DevOps success in your organization
    • Ten Practices for DevOps Success: 5 through 1 – Tactical, pragmatic tips for DevOps success in your organization
    • DevOps Tools – the Cart Or The Horse? – The role of tools in DevOps and tips for selecting and using tooling to achieve your end goal.
  • DevOps: A Culture Problem
    • The IT Crowd and the Coming Storm – Existing IT culture has both internal and external problems. Meanwhile, new challenges of scale and business cadence are pressing technology departments to change.
    • Use Your Words – Communication is the key to collaboration and solving problems when the stakes are high.
    • Do Unto Others – Build trust and respect and eliminate blame and hostility in your teams.
    • Throwing Things Over Walls – Break down the silos and establish a culture of responsibility and ownership, and align your teams to support the flow of concept to cash.
    • Kaizen: Continuous Improvement – Everything can be iterated upon to make it better – even yourself!
  • The Building Blocks of DevOps
    • DevOps Building Block: Agile – DevOps extends Agile principles to include deployment and operations.
    • DevOps Building Block: Lean – Understanding Lean can be the difference between a DevOps implementation that helps you achieve your company’s goals and one that’s just “the same but different.”
    • ITIL, ITSM, and the SDLC – Where does the “old school” fit in to a DevOps world?
  • Infrastructure Automation
    • Infrastructure As Code – Take a fundamentally different approach to building distributed systems whether in the datacenter or in the cloud.
    • Golden Image to Foil Ball – Learn about configuration mangement, automated provisioning, deployment and orchestration.
    • Immutable Deployment – With the rise of containers, different CM patterns are gaining currency.
    • Your Infrastructure Toolchain – Common tools in this space include Chef, Puppet, and Ansible but new container-based approaches like docker are on the rise. [Yes, this was before terraform and kubernetes, definitely places to update]
  • Continuous Delivery
    • Small + Fast = Better – Delivering small batches of change quickly reduces risk, improves quality, and restricts technical debt.
    • Continuous Integration Practices – Learn about Continuous Integration, Delivery, and Deployment, which you need and how to get there.
    • The Continuous Delivery Pipeline
    • The Role Of QA – Move from manual testing to automated with Test Driven Development (TDD) and Behavior Driven Development (BDD).
    • Your CI Toolchain – From Github to Jenkins, your code pipeline consists of many different parts with specific functions.
  • Reliability Engineering
    • Engineering Doesn’t End With Deployment – If you build it, you run it and other patterns for reliability engineering.
    • Design For Operation – Theory – Building a system to be resilient is the highest leverage step in ensuring high uptime and low MTTR.
    • Design For Operation – Practice – Ops has learned hard lessons about resiliency over the years – take it into account when building your applications.
    • Operate For Design: Metrics and Monitoring – Operational support isn’t just keeping the systems up, it provides crucial feedback back into the development cycle. {Yes, the kids call this observability now]
    • Operate for Design: Logging
    • Your SRE Toolchain – Monitoring, troubleshooting, and metrics are a vital space in your tooling strategy.
  • Additional DevOps Resources
    • Unicorns, Horses, and Donkeys, Oh My – In an emerging discipline, going to events to learn from other expert practitioners is your fastest route to success.
    • Ten Best DevOps Books You Need to Read – There’s a growing number of books on DevOps, here’s our top 10 reading list.
    • Navigating The Series of Tubes – DevOps information on the Web is fragmented and hard to find sometimes; here’s some of the best places to watch.
  • The Future of DevOps
    • Cloud to Containers to Serverless – Profound changes to our computing model have arrived to challenge many of our established practices.
    • The Rugged Frontier of DevOps: Security – Security is changing and is rapidly uptaking the DevOps movement, we cover the major implications here. {Yes. the kids call this DevSecOps now]
  • Conclusion
    • Next Steps: Am I a DevOp now? – Learn what next steps you should pursue for growing in DevOps understanding and practice.

Leave a comment

Filed under DevOps

DevOpsDays Austin 2023 Tips!

Well, the Agile Admins have handed the reins of DevOpsDays Austin off to a new generation! And DoDA 2023 is coming up next week! I’ll be there, participating rather than wrangling for once…

Shaun Mouton, one of the new core organizers, asked me to share an annotated overview of items which may be of interest to attendees! So read on, and hope to see you out at the conference.

Austin musical notes

for Visitors and Interested Parties

DevOpsDays Austin 2023 is coming up quite soon, and I thought I’d mention for the out-of-towners and folks who might want to know that there are some decent shows happening around the same time:

May 3rd: The Black Dahlia Murder w/Terror, etc at the Mohawk

  • Metalheads, trudge through some sludge. TBDM is here.

Daisy The Great at Antone’s

  • If twee indie pop heavy on harmony is your thing get your fill of Brooklyn’s darling sextet at Antone’s. Portland’s Olive Klug is playing too and this will probably be a really fun show.

The Drakes at Saxon Pub

  • The Drakes put on a tremendous rock n roll show at one of Austin’s classic venues.

Arc Angels at Gruene Hall

  • I can’t make this show and I’m bummed about it. You should go catch this Hill Country supergroup at one of Texas’ finest music venues.

Warren Hood at ABGB

  • I’ve heard good things about his shows but haven’t made it yet. Still, I feel pretty comfortable recommending this one.

Wednesdays with W.C. Clark at Pinballz Kingdom in Buda

  • The Godfather of Austin Blues lays it down. W.C.’s still got it and you can get it too.

Libby and the Loveless

  • Sam’s Town Point is a great place to hang out and see a show, and L&tL (nobody calls them that) will play a fine mix of country standards.

Michael Hale Organ Trio & Sketch at C-Boy’s Heart & Soul Bar

  • If you’re coming to Austin for the first time or haven’t been to C-Boy’s you might want to make this show. Great venue, great music.

Matt the Electrician & friends at The 04 Center

  • Matt’s fantastic, this is likely to be a great show, and the 04 is a good venue for them.

May 4th: Lil Wayne at Stubb’s

  • I might skip out on evening events if I can make this one. Sorry y’all, it’s Tha Carter.

Tennis at ACL Live at the Moody Theater

  • Tennis is a bit precious, but if you’re into it they’re a lot of fun. Dance it out at the Moody.

Barbara Nesbitt & Friends at The Continental Club Gallery

  • I’m getting a little annoyed writing these now, there’s so much to see. Nesbitt’s voice is a delightful slice of Americana.

Two Step Lessons

  • You’ve got good choices if you want to learn how to two-step on Thursday. Sam’s Town Point and the White Horse cater to newbies who want to learn how to put a little twang in their electric slide.

Large Brush Collection, Little Mazarn, Jenny Carson at Feels So Good

  • FSG started out as a differently named screenprinting shop and showed up at a few local tech conferences making shirts for attendees to-order. They’re chill people and put on a great series of shows at the shop.

Greg Koch at the 04 Center

  • Pretty sure Koch is going to tear the A-frame roof of the sucker. If you’re into groovy six-string acrobatics this will be a fun outing.

The Arc Angels with Madam Radar at Riverbend Centre

  • Again, probably going to have to miss this one For Reasons, but I’m not happy about it. Grab this chance to see some of our local greats burn the house down.

Manny Velazquez at the Little Longhorn Saloon

  • Manny V knows country music, puts on a good show, and Austin’s lucky to have him. Classic country sound at a fun little venue.

May 5th: The Blues Specialists at The Continental Club

  • The Blues Specialists have been holding down the Continental Club for ages with their Texas-style jump blues. If that sounds even a little like your jam, it’s absolutely your jam. Get you to the Continental, friend.

The Psychedelic Furs at ACL Live at the Moody Theater.

  • You a Furs fan? This would be a decent opportunity to catch them. The Moody Theater is a fine place to see a show. Oh, and apparently Evan Dando too, as a treat.

Charlie Robison at Gruene Hall

  • Charlie Robison is a genuine Austin treasure, and Gruene Hall is a stellar venue to see him perform. One of the finest singer-songwriters to come out of a town overflowing with them.

Wild Child at Emo’s

  • Austin indie pop band, they’ve got a lot of fun songs. The vocalist reminds me of my favorite New Orleans chanteuse.

Austin tasty eats

local food recommended by a local (it me, I grew up here)

Start here with this guide from Paul Czarkowski and friends for stuffing your face around this place. It could be somewhat out of date, things change pretty fast around here. I’ll add some notes of my own here even though I’ve contributed to that before:

Central TX BBQ

  • Don’t bother with the BBQ sides, it’s all about the meat. Have a nice salad somewhere else before and after or maybe a smoothie from Juiceland. Kerlin BBQ has sadly closed up shop, although they do still sell tasty kolaches. Gourdough’s may have closed too, which would be a blessing for my waistline.

Quesabirria tacos

  • These are still pretty hot right now, but prepare yourself. You dip the tacos in the cup of consomme and it all drips, this ain’t for fancy dress occasions. Bring extra napkins, and eat em fast before the tacos cool off from the griddle.

La Tunita 512 – 2400 Burleson Rd

  • this was one of Austin’s first offerings for quesabirria de res, and they’re delicious.

Actual Tacos

  • There’s Tacodeli and Torchy’s for the white people food that’s pretty tasty, and then there are tacos. These are taco joints.

Cuantos Tacos – 1108 E 12th St

  • Located about a mile away from the Alumni Center, Cuantos serves the sort of tacos you might find in CDMX. They’re good.

Veracruz All Natural (and Veracruz Fonda)

  • Somewhat fancy, somewhat down-to-earth, Veracruz is tasty and I am happy to recommend them to you.

Other Items Of Interest

If you’re spending any amount of time here and need something not covered by this guide feel free to holler at me on whatever social media platform you favor and can find me, I’ll be happy to come up with something that’ll put a smile on your face. I’m glad you’re going to be at the conference, please say hi or wave in my general direction if you get a chance!

Shaun

Leave a comment

Filed under Conferences, DevOps

New DevOps Courses Are Out!

James and I were revising one of our LinkedIn Learning courses, DevOps Foundations: Infrastructure as Code to keep up with the times, and while we were out there filming we knocked out some new courses as well!

DevOps for Managers talks about DevOps from the management perspective. What do you need to know, and how can you best unlock the success of the teams you’re working with when they are – or want to – excel at DevOps?

DevOps Antipatterns explores some common pitfalls that people fall into when starting out (or, even, later…)

Check them out! For reference here’s our whole curriculum that the agile admins have put out to help you in your path to thriving in tech.

Agile Admin LinkedIn Learning Courses

DevOps 101

DevOps 200-level

DevOps 300-level

DevOps 400-level

DevSecOps

Cloud Native

Observability

Leave a comment

Filed under DevOps