Tag Archives: technology

DORA AI Coding Report Breakdown

There’s a new DORA report out from Google, but it’s not the usual DevOps one we’ve come to expect – this one is entirely focused on the state of AI-assisted software development.

That’s not too surprising, straight up DevOps is last decade’s news – Gene Kim rebranded the DevOps Enterprise Summit and is publishing vibe coding books, the DevOps OGs like Patrick Debois and John Willis have been focusing on AI building, and so it makes sense that the DORA crew are also poking in that direction.

A lot of the shift in DevOps in recent years has been towards focusing on developer productivity. Whether that’s the rise of platforms to take burden and complexity away from devs, to Nicole Forsgren’s new SPACE metrics that extended her previous Accelerate/DORA metrics that were focused just on software delivery, everyone is keenly aware that unlocking the developers’ ability to create is important.

Companies I work with are really prioritizing that. At ServiceNow, they got Windsurf licenses for all and report a 10% productivity boost from it. And just “we have some AI” isn’t enough, Meta just cut one of their major AI teams because they had “gotten too bureaucratic” and slow so they wanted to move people to a newer team where they could get more done. So companies are taking developer productivity very seriously and spending real money and making big changes to get it.

Understanding Your Software Delivery Performance

As you read the report, you’ll notice that large chunks of it are NOT about AI directly. This first chapter, for example, recaps the important areas from previous DORA reports. It talks about metrics for software delivery and characterizes kinds of teams you see in the wild and their clusters of function and dysfunction. You don’t really get to AI till page 23.

Is this “AI-washing”? If so, it’s justified. People want “AI” to be the solution when they don’t understand their problem, or how to measure whether their problem is solved – AI can help with software engineering and DevOps but it does nothing to change the fundamental nature of any of it, so if you don’t understand the non-AI basics, if you’re handed AI to loose on your company you may as well be an armed toddler.

AI Adoption and Use

The report has good stats that dig deeper than news reports – while 90% of people are “using AI”, in general they use it maybe 1-2 hours out of their day and don’t go to it first all the time.

The thing I found the most surprising was what people were using it for. In my experience folks are using AI for the lighter work more often than actually writing code, but their research showed writing code was by far the most common use case (60%) and stuff like internal communication the least common task (48%) (outside calendar management at 25%, but the tools for that are terrible IMO).

Chatbots and IDEs are the vast majority of how people interact with AI still, integrated tool platforms only have 18% traction.

People do in general believe they’re being more productive from using AI, by a wide margin, and also believe their code quality has gone up! Pure vibe coding makes terrible quality code, I believe this is because how real coders are using AI is more thoughtful than just “write this for me.” And this is borne out in their trust metrics – most people do NOT trust AI output. 76% of respondents trust AI somewhat, a little, or not at all – despite 84% believing it has increased their productivity.

I think that’s super healthy – you should not trust AI output, but if you keep that in mind, it lets you use it and be more productive. You just have to double check and not expect magic. Consider that ServiceNow article I linked above about their Windsurf adoption, it’s not reastic to think AI is going to give you orders of magnitude of coding productivity increase – 10% is great though, more of an improvement than most other things you can do!

AI and Key Outcomes

That leads us into the meatier portion of the report, which is taking the research past “what people think” and trying to correlate real outcomes to these factors. Which is a little ticky, because developer morale is a part of what contributes to delivery and there may be a “placebo factor” where believing AI tools are making you better, makes you better whether or not the tool is contributing!

What they found is that while AI use does really improve individual effectiveness, code quality, and valuable work, it doesn’t help with friction and burnout, and has a significant negative effect on software delivery instability.

So what do we make of increased software delivery instability when we think we’re generating more and better code? And we think the org performance is still doing better? The report doesn’t know either.

My theory is similar to the answer to “why doesn’t everyone run multi-region systems when AWS us-east goes down from time to time?” Just to refresh you on the answer to that one, “it’s more expensive to do it right than to have an outage from time to time.” If you can cram more code down the pipe, you get more changes and therefore more instability. But just like companies gave up on shipping bug-free code long ago, some degree of failure with the tradeoff of shipping more stuff is a net financial win.

AI Capabilities Model

The reason I love DORA is they go deep and try to establish correlation of AI adoption best practices to outcomes. At page 49 is their big new framework for analysis of AI impact on an org. Here’s what they have so far on how specific practices correlate to specific outcomes, with caveats that it’ll take another year of data to know for sure (though AI innovation cycles are month by month, I hope they’ll find a way to get more data more quickly than a yearly cadence).

Platform Engineering

The report then takes another turn back to earlier DORA topics and talks about platform engineering, the benefits, and how to not suck at it.

For those who are unclear on that, you get wins from a platform that is user centric. So many organizations don’t – or deliberately mis- – understand that. You could call all the old centralized IT solutions from previous decades a “platform” – Tivoli, HP WhateverCenter, and so on – but they were universally hateful and got in the way of progress in the name of optimizing the work of some commodity team behind a ticket barrier. (I’ll be honest, there’s a lot of that at my current employer.)

I’m going to go a step farther than the report – if you don’t have a product manager guidlign your platform based on its end users’ needs, your platform is not really a platform, it’s a terrible efficiency play that is penny wise but pound foolish. Fight me.

Anyway, they then say “platforms, you know, it’s the place you can plug in AI.” Which is fine but a little basic.

Value Stream Management

Is important. The premise here is that given the basic premise of value flow (if you don’t know about lean and value streams and stuff, I’ve got a LinkedIn Learning course for you: DevOps Foundations: Lean and Agile), systems thinking dictates that if you accelerate pieces in your workflow you can actually harm your overall throughput, so major changes mean you need to revisit the overall value stream to make sure it’s still the right flow, and measure so you understand how speeding up pieces (like oh say making code) affects other pieces (like oh say release stability).

They find that AI adoption gets you a lot more net benefit in organizations that understand and engineer their value stream.

The AI Mirror

This section tries to address the mix of benefits and detriments we’ve already talked about with AI. It basically just says hey, rethink how you do stuff and see if you can use AI in a more targeted way to improve the bad pieces, so for software delivery try using it more for code reviews and in your delivery pipelines. It’s fine but pretty handwavey.

That’s understandable, I don’t think anyone’s meaningfully figured out how to bring AI to bear on the post-code writing part of the software delivery pipeline. There’s a bunch of hopefuls in this space but everything I’ve kicked the tires on seems still pretty sketch.

Metrics Frameworks

You need metrics to figure out if what you’re doing is helping or not. They mention frameworks like SPACE, DevEx, HEART, and DORA’s software delivery metrics, and note that you should be looking at developer experience, product excellence, and organizational effectiveness. “Does AI change this?” Maybe, probably not as much as you think.

And that’s the end at page 96, there’s 50 pages of credits and references and data and methodology if you want to get into it.

Those last 4 chapters feel more like an appendix, they don’t really flow with the rest of the report. The AI methodology talks about things to do specifically boost your AI capabilities (Clear and communicated AI stance… Working in small batches) which somewhat overlap (Quality internal platforms, User-centric focus) with these later chapters but to a degree don’t. If value stream management is shown to improve your AI outcomes then – why’s it not in the capability model?

I assume the answer is, to a degree, “Hey man this is a work in progress” which is fair enough.

Conclusion

I find two major benefits from reports like this, and judge their success based on how well they achieve them.

  1. Showing clear benefits of something, so you can use it to influence others to adopt it. This report does very well there. One of my complaints about the DORA reports is that in recent years they’d become more about the “next big thing” than about demonstrating the clear benefits of core DevOps practices, so I’d often go back and refer to older reports instead of the newer ones. But here – are people getting benefit from AI? Yes, and here’s what, and here’s what not. Very cleaar and well supported.
  2. Telling you how to best go about doing something, so you can adopt it more effectively. The report also does well here, with the caveat of “so much of this is still emerging and moving at hyperspeed that it’s hard to know.” They’ve identified practices within AI adoption and in the larger organization that are correlated to better outcomes, and that’s great.

And I do like the mix of old and new in this report. You have to wave the new shiny at people to get them to pay attention, but in the end there are core truths about running a company and a technology organization within a company – value streams, metrics, developer experience, release cadence and quality – that AI or any new silver bullet may change the implementation of, but does not change fundamentally, and it’s a good reminder that adopting sound business basics is the best way to take advantage of any new opportunity, in this case AI.

TL;DR – Good report, use it to learn how people are benefitting from AI and to understand specific things you can do to make your organization benefit the most from it!

Leave a comment

Filed under AI, DevOps

OpenAI vs Opus – the Penguin

How does OpenAI compare versus a penguin in the role of a legal secretary? Let’s see…

import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

def get_completion(prompt, model="gpt-3.5-turbo", temperature=0): 
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
    )
    return response.choices[0].message["content"]

prompt = f"""
Translate the following letter from a legal firm to a delinquent client, Fred 'The Cincinnati Strangler' Johnson, from hostile slang to a friendly business letter: 
'Attention criminal pervert: Where the hell is my dough, you twisted goon?  Don't forget who got you sprung on that technicality.  I want the money.  Now.  P.S. Have you strangled your wife yet, psycho-brain?'
"""
response = get_completion(prompt)
print(response)
Dear Mr. Johnson,

We are writing to follow up on the outstanding balance on your account. As your legal representatives, we were able to assist you in your legal matters and we kindly request that you fulfill your financial obligations to us promptly.

Please be advised that we require payment as soon as possible. We appreciate your attention to this matter and look forward to receiving your payment promptly.

Thank you for your cooperation.

Sincerely,

I still have to give it to Opus by a flipper, mainly for the P.S., but OpenAI got into the right general headspace! And it’s too hot here in Texas for me to keep a penguin around to translate what I want to say into more acceptable terms, and I sure get tired of doing it.

I miss my old Eudora email client that would put little hot peppers next to my email if it thought it was too spicy…

1 Comment

Filed under AI, General

The Right Way To Use Tagging In The Cloud

This came up today at work and I realized that over my now-decades of cloud engineering, I have developed a very specific way of using tags that sets both infra dev teams and SRE teams up for success, and I wanted to share it.

Who cares about tags? I do. They are the only persistent source of information you can trust (as much as you can trust anything in this fallen world) to communicate information about an infrastructure asset beyond what the cloud or virtualization fabric it’s running in knows. You may have a terraform state, you may have a database or etcd or something that knows what things are – but those systems can go down or get corrupted. Tags are the one thing that if someone can see the infrastructure – via console or CLI or API or integrated tool – that they can always see. Server names are notoriously unreliable – ideally in a modern infrastructure you don’t reuse servers from one task to another or put multiple workloads on one, but that’s a historical practice that pops up all to often, and server names have character limits (even if they don’t, the management systems around them usually enforce one).

Many powerful tools like Datadog work by exclusively relying on tags. It simplifies operation and prevents errors if, when you add a new production app server, that automatically gets pulled into the right monitoring dashboards and alerting schemes because it is tagged right.

I’ve run very large complex cloud environments using this scheme as the primary means to drive operations.

Top level tag rules:

  1. Tag everything. Tagging’s not just for servers. Every cloud element that can take a tag, tag. Network, disk images, snapshots, lambdas, cloud services, weird little cloud widgets (“S3 VPC endpoint!”).
  2. Use uniform tags. It’s best to specify “all lower case, no spaces” and so on. If people decide to word a tag slightly differently in two places, the value is lost. Both the key and the value, but especially the key – teach people that if you say “owner” that means “owner” not “Owner” and “owning party” and whatever else.
  3. Don’t overtag with attributes you can easily see. Instance size, what AZ it’s in, and so on is already part of the cloud metadata so it’s inefficient to add tags for it.
  4. Use standard tags. This is what I’ll cover in the rest of this article.

At the risk of oversimplifying, you need two things out of your systems environment – compliance and management. And tags are a great way to get it.

Compliance

Attribution! Cost! Security! You need to know where infrastructure came from, who owns it, who’s paying for it, and if it’s even supposed to be there in the first place.

Who owns it?

Tag all cloud assets with an owner (email address) basically whatever is required to uniquely identify who owns an asset. Should be a team email for persistent assets, if it’s a personal email then the assumption should be if that person leaves the company those assets get deleted (good for sandboxes etc). 

The amount of highly paid engineer time I’ve seen wasted over the last decade of people having to go out and do cattle calls of “Hey who owns these… we need to turn some off for cost or patch them for security or explain them for compliance… No really, who owns these…” is shocking.

owner:myteam@mycompany.com

Who’s paying for it

This varies but it’s important. “Owner” might not be sufficient in an environment – often some kind of cost allocation code is required based on how your company does finances. Is it a centralized expense or does it get allocated to a client? Is it a production or development expense, those are often handled differently from a finance perspective. At scale you may need a several-parter – in my current consulting job there’s a contract number but also a specific cost code inside that contract number that we need all expenses divvied up between.

billing:CUCT30001

Where did it come from

Traceability both “up” and “down” the chain. When you go look at a random cloud instance, even if you know who it belongs to you can’t tell how it got there. Was it created by Terraform? If so where’s the state file? Was it created via some other automation system you have? Github? Rundeck? Custom python app #25?

Some tools like Cloudformation do this automatically. Otherwise, consider adding a source tag or set of tags with sufficient information to trace the live system back to the automation. Developers love tagging git commits and branches with versions and JIRA tickets and release dates and such, same concept applies here. Different things make sense depending on your tech stack – if you GitOps everything then the source might be a specific build, or you want to say which s3 bucket your tfstate is in… Here as an example, I’m working with a system that is terraform instantiated from a gitops pipeline so I’ve made a source tag that says github and then the repo name and then the action name. And for the tfstate I have it saved in an s3 bucket named “mystatebucket.”

source:github/myapp/deploy-action
sourcestate:s3/mystatebucket

When does it go

OK, I know the last two sound like the lyrics to “Cotton-Eyed Joe”, which is a bonus. But a major source of cost creep is infrastructure that was intended to be there for a short time – a demo, a dev cycle – that ends up just living forever. And sure, you can just send nag-o-grams to the owner list, but it’s better to tag systems with an expires tag in date format (ideally YYYY-MM-DD-HH-MM as God intended). “expires:never” is acceptable for production infrastructure, though I’ve even used it on autoscaling prod infrastructure to make sure systems get turned over and don’t live too long.

expires:2025-02-01-00-00-00
or
expires:never

Management

Operations! Incidents! Cost and security again! Keep the entire operational cycle, including “during bad production incidents”, in mind when designing tags. People tear down stacks/clusters, or go into the console and “kill servers”, and accidentally leave other infrastructure – you need to be able to identify and clean up orphaned assets. Hackers get your AWS key and spin up a huge volume of bitcoin miners. Identifying and actioning on infrastructure accurately and efficiently is the goal.

As in any healthy system, the “compliance” tags above aren’t just useful to the beancounters, they’re helpful to you as a cloud engineer or SRE. But beyond that, you want a taxonomy of your systems to use to manage them by grouping operations, monitoring, and so on.

This scheme may differ based on your system’s needs, but I’ve found a general formula that fits in most cases I come across. Again, it assumes virtual systems where servers have one purpose – that’s modern best practice. “Sharing is the devil.”

EARFI

I like to pronounce this “errr-feee.” It’s a hierarchy to group your systems.

  • environment – What environment does this represent to you, e.g. dev, test, production, as this is usually the primary element of concern to an operator. “environment:uat” vs “environment:prod”.
  • application – What application or system is this hosting? The online banking app? The reporting system? The security monitoring server? The mobile game backend? GenAI training? “application:banking”.
  • role – What function does this specific server perform? Webserver dbserver, appserver, kafka – systems in an identical role should have identical loadouts. “role:apiserver” vs “role:dbserver”. Keep in mind this is a hierarchy and you won’t have guaranteed uniqueness across it – for example, “application:banking,role:dbserver” may be quite different from “application:mobilegame,role:dbserver” so you would usually never refer to just “role:dbserver.”
  • flavor – Optional, but useful in case you need to differentiate something special in your org that is a primary lever of operation (Windows vs Linux?  CPU vs GPU nodes in the same k8s cluster? v2 vs v2?). I usually find there’s only one of these (besides of course region and things you shouldn’t tag because they are in other metadata). For our apiserver example, consider that maybe we have the same code running on all our api servers but via load balancer we send REST queries to one set and SOAP queries to another set for caching and performance reasons. “flavor:rest” vs “flavor:soap”.
  • instance – A unique identifier among identical boxes in a specific EARF set, most commonly just an integer. “instance:2”. You could use a GUID if you really need it but that’s a pain to type for an operator.

This then allows you to target specific groups of your infrastructure, down to a single element or up to entire products.  

  • “Run this week’s security patches on all the environment:uat, application:banking, role:apiserver, flavor:rest servers.” Once you verify, you can do the same on environment:prod.”
  • “The second of the three servers in that autoscaling group is locked up. Terminate environment:uat, application:banking, role:apiserver, flavor:rest, instance:2
  • “We seem to be having memory problems on the apiservers. Is it one or all of the boxes? Check the average of environment:prod, application:banking, role:apiserver, flavor:rest and then also show it broken down by instance tag. It’s high on just some of the servers but not all? Try flavor:rest vs flavor:soap to see if it’s dependent on that functionality. Is it load do you think? Compare to the aggregate of environment:uat to see if it’s the same in an idle system.”
  • “Set up an alert for any environment:prod server that goes down. And one for any environment:prod, application:banking, role:apiserver that throws 500 errors.”
  • “Security demands we check all our DB servers for a new vulnerability. Try sending this curl payload to all role:dbservers, doesn’t matter what application. They say it won’t hurt anything but do it to environment:uat before environment:prod for safety.”

So now a random new operator gets an alert about a system outage and logs into the AWS console and sees not just “i-123456 started 2 days ago,” they see

owner:myteam@mycompany.com
billing:CUCT30001
source:github/myapp/deploy-action
sourcestate:s3/mystatebucket

expires:never
environment:prod

application:mobilegame
role:dbserver
flavor:read-only
instance:2


That operator now has a huge amount of information to contextualize their work, that at best they’d have to go look up in docs or systems and at worst they’d have to just start serially spamming. They know who owns it, what generates it, what it does and has hints at how important it is. (prod – probably important. A duplicate read secondary – could be worse.) And then runbooks can be very crisp about what to do in what situation by also using the tags. “If the server is environment:prod then you must initiate an incident <here>… If the server is a role:dbserver and a role:read-only it is OK to terminate it and bring up a new one but then you have to go run runbook <X> and run job <y> to set it up as a read secondary…”

Feel free and let me know how you use tags and what you can’t live without!

Leave a comment

Filed under Cloud, DevOps, Monitoring