Category Archives: DevOps

Pertaining to agile system administration concepts and techniques.

SRE: The Biggest Lie Since Kanban

There is a lot of discussion lately about how SRE fits into or competes with or whatever-s with DevOps.  I’m scheduled to speak on a “SRE vs DevOps Smackdown” panel today here at Innotech Austin, and at the exact same time I see Bridget tweeting Liz Fong-Jones’ slides from Velocity on using SRE to implement DevOps. And the more I think about it, and see what people are doing, the more I’m getting worried.

The Big Lie

Just to get the easily provoked to put up their pitchforks, I don’t dislike SRE and I don’t dislike Kanban.  The reason I call Kanban a “Big Lie” is because really doing Kanban correctly and getting the value out of it requires even more discipline that doing something like Scrum.  But it looks so close to doing nothing new that many lazy teams out there say “they’re doing Kanban” and by that they mean they’re doing nothing, but they’ve turned on the Kanban view in JIRA for your convenience.  They have no predictability, they’re not managing WIP, they’re not identifying bottlenecks – they just have a visible board now and that’s it. I strongly believe from my experience that most teams “doing Kanban” are really doing mostly nothing.  There’s articles on this blog about how I make my teams I’m teaching Agile do Scrum first if they want to get to Kanban to build up the required discipline.  And I’m not just a crank, David Hawks from Agile Velocity just told our management team the same thing yesterday, which brought this back to mind for me and spurred this article.

Because I’m starting to see the same thing with SRE.  It’s not surprising – there was and is plenty of “DevOps-washing” of existing teams out there.  Rename your ops team DevOps, done. Well, at least DevOps was able to say “it’s a methodology not a job description or group name stop it” to force deeper thought – it’s why my team at work is the “Engineering Operations” team not “DevOps”, Lee Thompson insisted on that when he set it up! But SRE – yeah, it’s a team just like your own ops team, from an “org chart” viewpoint it looks the same. So doing SRE can – and in many shops does – mean doing nothing new. You just call your existing ops team SRE and figure you’re done.

A brief personal history lesson – my last job before DevOps hit was running the Web Systems team at National Instruments, an ops team.  That’s where we agile admins met, Peco and James were both ops engineers on that team! (Karthik was a dev we worked with.) We had smart people and did ops all right.  We had automation, monitoring, we had “definition of done” standards for new services. You wouldn’t have to squint too hard to just call that team a SRE team and call it a day. But, I wouldn’t wish that job on my worst enemy. It was brutal trying to do ops for just 4-5 dev teams, and that’s with business support, some shared goals, and so on. Our quality of life was terrible, we weren’t empowered, and no matter how hard we tried, success was always right out of our grasp. When we actually started a team using DevOps thinking at NI after that, the difference was night and day, and we actually began to enjoy our jobs as ops engineers. I would hate for anyone to deceive themselves into thinking they’re getting the goodness they should be able to get from a DevOps/”real” SRE approach while still just doing it the way we were doing it.

I have a friend at a local legal software firm, who told me they’re going through and just renaming all the QA folks to SWET (Software Engineer In Test), whether they can code or not, and all the Ops folks to SREs in this manner. One might be charitable and say they’re leaning forward and they intend to loop around and back that up with retraining or something, but… will they? Probably not, it’s just a rename to the hot new term without any of the changes to help those engineers succeed more in their jobs!

SRE isn’t “an implementation of DevOps” if you just apply it as a name for a hopped-up ops team.  Properly understood, it can be an implementation of one of the three parts of DevOps, Infrastructure As Code, Continuous Integration/Deployment, and Site Reliability Engineering. But note that reliability engineering doesn’t start with deploy to production; so much of it is Michael Nygard-esque techniques to write your app reliably in the first place; reliability engineering, in usual DevOps fashion, requires dev and ops work both way back in the dev cycle and out in production to work right. It doesn’t need to be a different team.  If it is, and that team doesn’t get to decide if it takes over ops for a given app, and it’s not allowed to spend 50% of its time on reducing toil and you’re not comping SREs like you do dev engineers – it’s not SRE and you’re a liar for calling it SRE. If you don’t keep DevOps principles in mind, you’re just going to get your old ops team with its old problems again.

That’s why SRE is a Big Lie – because it enables people to say they’re doing a thing that could help their organization succeed, and their dev and ops engineers to have a better career and life while doing so – but not really do it.  Yes, there have been Big Lies before, which is why I cite Kanban as another example – but even if the new criminal is pretty much like the old criminal, you still put their picture up on the post office wall.

Frankly, anyone pushing SRE that doesn’t put warning labels on it is contributing to the problem.  “Well but it mentions in chapter 20 of the second book,” said someone responding to the first version of this article on Twitter.  Not good enough. If something you’re selling is profoundly misused it’s your responsibility to be more up front about the issues.

The Little Problems

Now there are legitimate issues to have even with the “real SRE” model, at least the way that it’s usually being described.  The Google books kinda try to have it both ways, describing it as an engineering practice (how I describe it above and in the SRE course I did for LinkedIn) and describing it as “a team that works this way.”  Even among those not SRE-washing classical ops, the generally understood model is that SRE is a org/job title for a production operations team.

There’s an issue here, the problem of specialization.  If you are Google scale, well, then you’re going to have to specialize and a separate ops team makes sense.  But – first of all, you are not Google scale.  In my opinion, if you are under 100 engineers, you are committing an error by having a separate ops team. You need your product teams to own their products. Second of all – I don’t want to make an enemy of all the lovely Google engineers out there, but is your experience with Google services that they evolve quickly and get better once they go to wide release?  It’s not mine.  They rot.  Have you used Google Hangouts lately without it ending up with cursing and moving off to someone’s Zoom? That kind of specialization still has its downsides in terms of hindering your feedback loops that let you improve (the Second Way). Is SRE just Google-ese for “sustaining?”

I get that the Google folks say they still get feedback and innovation using the SRE model, I’m sure they do and they work hard at that, but that doesn’t change the fact that running a separate ops team is making a deliberate tradeoff between innovation and efficiency. There is no way in that you get as much feedback or improve as quickly with a separate team, you can compensate for it, but you’re still saying “look… Not as important.” Which is fine if that’s your situation, I worked at many companies with 200 abandoned apps in production and you had to do something.  But “not getting there in the first place” is better.

Some of the draw of the model, and why Google is highly aligned with it, is Kubernetes itself. k8s is very complex to run drives people back a little bit to the old priest-in-the-tabernacle model of “someone maintains the infrastructure and you write the app and then you have them deploy it,” but now there’s some standards (like deploying as a container) that make that OK – I guess? But if you think reliability, and observability, are the primary responsibility of an ops team that is not involved in constructing the application, you either have deep and profound company standards that allow seamless plugging of the one into the other or you’re fooling yourself. 90% of you are fooling yourself.

At this conference I heard “Service meshes!  They get you observability so your devs don’t have to think about it.” Do you not see how dangerous that mindset is?

SRE, as interpreted as “a separate newfangled ops team,” may work for some but you need to be realistic about the issues and tradeoffs you’re making.  Consider whether product teams supporting their product, maybe with aid from a platform team making tooling and an enabling/consulting/center of excellence team that can give expert advice?  DevOps helped us see how the “throw it over the wall from dev to ops” model was profoundly harming our industry.  Throwing over the wall from dev to SRE doesn’t improve that, it’s profoundly regressive. Doing SRE “right” to compensate for this, like doing Kanban right, requires more skill and discipline, not less – be realistic about whether you have Google levels of skill and discipline in your org, eh?

Conclusion

SRE (and Kanban) aren’t bad, they have their pros and cons, but they are easy to “pretend to do” in some minimal, cargo cult-ey way that gets you little of the benefits. And if you think spinning up an ops team and calling it SRE is “an implementation of DevOps” you’ve swallowed the worst poison pill the DevOps talk circuit can deal to you.

9 Comments

Filed under DevOps

3 Features I love in Kubernetes 1.11

Originally published in the cloudnative blog on July 3rd

Kubernetes 1.11 was released last week, and I spent some time looking at the features and fixes released. It’s the 2nd Kubernetes release this year, and this one comes with a lot of cool features to try out. You can take a look at the release notes here, or if you want to get down in the weeds, check out the changelog.

I’m most excited about the “Dynamic Kubelet Configuration” feature! This feature existed previously but has graduated to a “beta” feature. It means that’s it’s more stable than before, and the feature is well recognized. The feature essentially allows you to change the configuration of Kubelet on a running cluster in a more accessible manner using configmaps. The configmap is saved as a part of the Node object which is monitored by Kubelet. Any changes to it and Kubelet will download the reference and stop. If you’re using something like systemd to watch Kubelet, it’ll automagically restart Kubelet, which will start with the new configuration. This feature is super exciting because it gives admins who manage all of the nodes a little break. In the past, any updates to the config had to be rolled individually to each node, which could be a time-consuming process.

I like that Custom Resource Definitions (CRD) are a lot more usable now with versioning. In the past, you were limited to a single version of a CRD; any changes, and you had to create a new one and manually convert everything that used the old CRD to the new one. All a bit painful! With versioning, the path to using updated custom resources is more straightforward than before.

Finally, CoreDNS was promoted to General Availability! In the early Kubernetes years, there some confusion on what DNS provider to use, and there were a few options. For someone who was looking at the ecosystem from the outside, it was hard to tell what DNS solution to pick. I touched on this in my Kubernetes: CNCF Ecosystem course, and how the CNCF was able to steer the community to a better default! It took some time, but in the end, having CoreDNS as a default DNS server will help Kubernetes be more reliable, and make DNS debugging simpler for those of us dealing with the inner workings of K8s.

There are a lot more things released, so check out the release announcement if you haven’t already!

There are also a few tiny things that were released that have me excited:

First, this PR allows for Base64 decoding in a kubectl get command using go-templates. Super useful to have a one-liner to decode what something might be in a secret.

Second, from a monitoring perspective, Kubelet will expose a new endpoint, /metrics/probes. This presents a new Prometheus metric that contains the liveness and readiness probe results for the Kubelet. It will allow you to build better health checks and get a better picture of the state of your cluster.

Third, RBAC decisions are in the audit logs as audit events! Since I’ve worked on authn and authz systems in the past, I get irrationally excited about stuff like this. In the past, we’d have to go hunting through logs to find why an RBAC call passed/failed, whereas now we can quickly look at the audit events stream.

That’s my (biased) list! What about you? What feature or bugfix has you excited? Let me know in the comments below, or tweet at me @iteration1!

Leave a comment

Filed under DevOps

Cloud-native helloworld

wood_3200402_1920-1
Originally published  on cloudnative labs on June 28th, 2018

Speaking and writing come pretty naturally to me, but setting a title is always the hardest part. It’s true while writing code as well- writing 1000 lines of code comes naturally, but when I have to create and name a new file, it’s a different story…

But, I digress- Hi! I’m Karthik Gaekwad, and I’m the newest member of the Developer Relations team here at Cloud-native labs. If you live in Austin, we’ve probably already crossed paths at one of the many meetups I attend or run including CloudAustinAustin DevopsDocker Austin, OWASP, etc; or perhaps at Devopsdays Austin, for which I’ve been one of the core organizers since its inception in 2012. I’m also an author on Lynda.com, and have authored a few courses on Kubernetes, and Agile devops methodologies.

I’m joining the Cloud-native labs team from the Oracle Container Engine team- which is Oracle’s managed Kubernetes service running on Oracle Cloud Infrastructure. Naturally, I’ll be focusing my efforts on Kubernetes, microservices and Cloud Native architectures and applications.

There are many things I’m excited about with the new job, but I’m most excited to learn and teach! The one constant theme that I’ve noticed with Kubernetes over the last few years since it got hot is the word “How?”. As a user of Kubernetes, I’ve frequented in the Kubernetes doc searching for answers, and as a Lynda author, I’ve received many messages of thanks from viewers that they now knew how to use Kubernetes. The cloud-native ecosystem is one of the fastest growing ecosystems I’ve seen, and it’s hard to keep up with the changes, new releases, and new projects that support the ecosystem. As a result, I’m excited to spend more time keeping pace with all the new happenings and spend time researching best practices for microservices and cloud-native apps, welcome new users to the world of K8s, and bridge the gap between the cloud-native platforms we have on OCI today.

I’ll be spending a lot of time researching, speaking, blogging and answering questions! Feel free to reach out to me on TwitterLinkedin or comment on here as well- I’m here for you!

-Karthik

Leave a comment

Filed under DevOps

DevOpsDays Austin 2018 Videos Posted

Well, we were “unplugged,” but we managed to smuggle videos out anyway for your pleasure… Watch ’em, like ’em, comment to the speakers that you appreciate them giving to the global tech community!  Especially since this year they weren’t pre-selected, voting on talks was done at the event, so these folks prepared a talk but weren’t for sure to give it, which takes guts!

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Austin 2018 Retrospective and 2019 Prospectus

logoAll right, DevOpsDays Austin 2018 went great and the organizers (thanks be unto them – James Wickett, Dan Zentgraf, Boyd Hemphill, Richard Boyd, Scott Baldwin, Lee Thompson, Karthik Gaekwad, Marisa Sawatphadungkij, Ian Richardson, Bill Hackett, Chris Casey, Carl Perry, and our ConferenceOps finance handler Laura Wickett) have had the time to do a retrospective and both share what we’ve learned and set a course for next year’s event! This is long and I assume mostly of interest to other DevOpsDays organizers, so buckle in.

DoD Austin this year was another experimental year. Austin was the third DevOpsDays city in the US and the eleventh globally, and has been going every year since 2012.  Because our community has such a long history with DevOpsDays, we experiment with our format to find what works the best for us.

This year, we tried a couple daring things (more details in DevOpsDays Summit Austin 2018 – “DevOps Unplugged”):

  1. Voting on talks onsite instead of ahead of time (saw this at ProductCamp Austin)
  2. No sponsor booths (like the early DevOpsDays, Silicon Valley was like this for several years)
  3. Boxed lunches (like the early DevOpsDays, Silicon Valley was like this for several years)
  4. Capped headcount low at 400 (despite having sold 650 tickets last year)
  5. No streaming the talks (video is coming though)

Read the linked article for why, but the TL;DR is that we’re a nonprofit conference that exists to drive community engagement, and the “DevOps Talk Circuit,” the increased sponsor lead-churn demands, the time we spent on fancy lunches and such, and just the sheer number of attendees and weight of extras we were adding on were choking out the actual goal of the conference.  Despite having a huge slate of great keynoters at 2017 and everything being the biggest and best DoDA ever – we the organizers didn’t have a good time. We didn’t learn anything or make new friends. And we heard from other experts in town that said the same thing. So a dramatic change was implemented to pare the event back down to basics.  But how’d it work out?

We did a bunch of retrospective activities to find the answer!

  1. SurveyMonkey survey of all attendees
  2. Survey of all sponsors
  3. Community retrospective at the Austin DevOps user group
  4. Organizer retrospective

Attendee Survey Feedback

Of 400 attendees, we got 51 respondents (12.5%). Our overall NPS was 25 (“pretty good”). We don’t have a last year NPS to compare to, we didn’t do a great job of post event surveying last year mostly due to burnout (once you’ve spent most of your time prepping a conference, it’s time to get back to your real work, family, etc.).

Food Quality Talk Quality Openspace Quality Venue Quality Happy Hour Quality
Very high – 9 (18%) Very high – 6 (12%) Very high – 7 (14%) Very high – 12 (24%) Very high – 12 (25%)
High – 20 (39%) High – 27 (53%) High – 12 (47%) High – 29 (57%) High – 12 (25%)
Neither – 17 (33%) Neither – 9 (18%) Neither – 12 (24%) Neither – 7 (14%) Neither – 22 (46%)
Low – 4 (8%) Low – 8 (16%) Low – 8 (4%) Low – 3 (6%) Low – 2 (4%)
Very low – 1 (2%) Very low – 1 (2%) Very low – 3 (6%) Very low – 0 (0%) Very low – 0 (0%)

So everything was 50% or better “very high or high,” which seems good. We asked about favorite sponsors – ones mentioned by multiple participants include Cisco, Red Hat, NS1, VictorOps, Sumo Logic, xMatters, and Praecipio.

The comments were enlightening.  This year’s format was pretty divisive – there were lots of comments about liking voting on the talks and lots of comments about not liking it; there were lots of comments about liking e.g. “The new format with less vendor bloat” and then also lots of comments wanting sponsor booths back. And frankly, that’s what we expected – the new format was expressly designed to be attractive to some kinds of attendees and sponsors and not to others.

Overall, the positive comments predominated on the openspaces, keynotes, and ignites, and negative predominated on the talks and lack of booths.  (Several of those respondents identified as sponsors.)

Sponsor Survey Feedback

Total sponsor NPS was 7 (“good”) from 14 respondents of our 17 sponsors.  Again, there wasn’t the usual bell curve distribution – some sponsors loved it and others hated it.  The venue and the conversations people had onsite were very highly rated. The limited swag table aspect was low rated. The 30 minute suite sessions and lead quality were sharply bimodal – for example:

How did your 30 minute suite demo go?

  • Did not use 7.14%
  • Very well 7.14%
  • Well 28.57%
  • Neither poorly nor well 14.29%
  • Poorly 28.57%
  • Very poorly 14.29%

User Group Feedback

Read the board yourself!  Attendees, some organizers were in attendance.

image1

Analysis

Change is hard

People’s expectations were hard to alter. Especially in the sponsor realm where the person who books the sponsorship isn’t usually the person that comes on site.  One sponsor comment said “Without a booth, not worth our $5000!”  Well, yeah, that’s why we didn’t charge you $5k this year. People that go to multiple DevOpsDays, and especially sponsors, but even people who had just been to our event multiple years – we emailed and tweeted and blogged and put stuff on the signup forms, but the changes were still a surprise to many.  Voting on the talks was a concern not as much from speakers, but from people who “wanted their schedule set in advance!” and from people who were “afraid it makes speakers feel bad.”

Money isn’t hard

Even with the much lower sponsor cost this year ($3k), and lowering our headcount significantly (400), and providing the same great venue and lunches and breakfasts and drinks and not 1 but 2 shirts and blowing it out on the happy hour, plus being ripped off by our happy hour venue (not going back there!!!), we were still well in the black enough that we’re giving thousands of dollars to charity at the end of the event.

In fact, one of the advantages of this year’s format was that we weren’t giving 1/3 of our tickets away for free to a huge army of organizers, to speakers, etc.  Adding more sponsor stuff requires adding more volunteers that just eats back into the revenue stream again.

Specific Outcomes

Voting on talks

There was enough pushback that we won’t do that next year.  Submissions were lower this year, and a bunch of people dropped out before the event.  However, many of the people who dropped out are, to be blunt, the people we wanted to drop out. Talks “submitted on behalf of” someone. Vendor roadshow talks.

Here’s the thing – here in Austin, we’re pretty blessed.  We have a huge tech community with all the big players.  If you want to “have your secretary submit your talk, fly in, drive to the venue, give your talk, fly out” – whoever you are,  you really don’t have anything more interesting to say than the people who are already here. So if your goal being at DoDA isn’t to interact with the community, we have plenty of talk submissions already, thanks.  I get that if you’re starting up a DoD in the middle of nowhere the people on the “DevOps Talk Circuit” are key to bringing in new ideas and jumpstarting you, and I don’t devalue that.  But for us, we don’t need that and it doesn’t serve the needs of our current community.

This isn’t to say people from away aren’t welcome – John Willis is from Atlanta but he’s part of our community, because when he comes here that’s how he interacts with us.  (One of the “What did you like the most” survey comments simply said “John Willis.”)

People suggested various half-measures – “have us vote a week before!” But the additional logistics on that is very much not worth it, especially given what we think we’ve learned about our talk needs – read on for that!

Sponsor tables

OK, no sponsor tables was not universally beloved. Some sponsors – and not just the “here for the leadz” sponsors we were deliberately discouraging with the format – didn’t like it because it was harder to interact with folks about their product. But – here’s the rub – we had just as many complaints last year when we *did* have sponsor tables!  “My table was in the corner.” “There wasn’t enough foot traffic driven to me.”

The stadium format is pretty “noisy” and if we had sponsor tables back we’d have to do talks in some far-away rooms again, and removing those rooms this year saved us a lot of money and also people always hated it (like – FAR away).

Also, I’ll be honest, we had problems with sponsor misbehavior last year.  Silver sponsors claiming a table and standing behind it like a gold. Sponsors going out on the field (forbidden by UT). Sponsors trying to have food trucks park outside (also forbidden by UT police). Disruptive activity of a number of different sorts, requiring lots of work by organizers and volunteers and venue staff to deal with. I am sure many of them thought they were being “scrappy” etc. but in the end, we don’t get paid for this conference so we don’t need to put up with crap for it either. Discussion about “firing” certain sponsors was had.

We aren’t going back to the usual sponsor tables, but we are going to try something even more different – read on for that!

Boxed lunches

In early DoDA, we kept having super-deluxe Austin fare – BBQ, tex-mex – not from a caterer but from the real good places. This was for all the folks from away we were bringing in and wanted to show an Austin good time to!

Unfortunately, last year food lines for 650 people were a problem. Vendors weren’t adequately prepared with people or food.  We had to have many volunteers assigned. Food lines were super long and slow and a source of frustration.

This year we did have some comments about “I wanted the deluxe foods.” But they were far overwhelmed by those who appreciated being able to grab sustenance and get back to why they are here, learning and discussion. So with enough money we may try to get some kind of super-deluxe box lunch, but the box lunches will stay.

Lower headcount

The lower headcount was universally beloved except by lead generators and those who couldn’t get a ticket. More and better interaction, many positive comments noted the more intimate communication in openspaces and hallway track.  Keep.

No streaming

Worked out great.  No one complained, and the cost and org/volunteer time and schedule and stage compromises we have to make for live streaming are immensely negative.  Not going back.

2019 Planning

First of all, a disclaimer.  I am sharing this in the interests of transparency and helping other organizers learn from what we’ve done.  I don’t claim Austin is doing things the “one true way” and I know our community’s needs are different from many others. None of this is intended to denigrate any other events and their decisions. You don’t need to justify why you do things differently or why any of this isn’t right for your community.

Every year I start our planning with some basic questions.

  1. Do we want to have a DevOpsDays Austin next year?
  2. If so, why?  What is the goal of this year’s event?

“Inertia” is a bad reason to do anything.  We don’t have “money” as a reason because we have to spend what we get, we don’t pocket anything except some gifts. (My kid has already appropriated the bluetooth speaker I got this year…)

The group of organizers (over a tasty dinner at Chez Zee) decided “yes”, and after a good bit of discussion they decided that to us, this year, the goal of DevOpsDays Austin is to “Promote collaboration and sharing and networking specifically for the Austin technical community.” Now, that’s a pretty non-controversial statement on its face – but then as we plan stuff, we really test it against our goal and see if it supports it, is neutral, or takes away from it.  If it’s neutral or takes away, it goes.

This decision and clear statement (I think Marisa is who put it together for us) pricked my memory and I pulled out our attendee survey comments.  What did you like the most about DevOpsDays Austin 2018?  “Ability to collaborate with others.” “Enjoyed hearing what others were doing.” “Focus on the community.” “It’s a well-run, intimate conference.  I always see people I know.” “The community involvement.”  Her sentence crystallized what people were telling us was their favorite part of the event – super!

OK, so what does that mean for each area?

Content

People love the lightning talks more than anything.  Then the keynotes. Then the talks. It’s why we tried the attendee voting. The discussion covered how many of the talks seem too long and boring even at 35 minutes, and people trying to get too technical in them suffer from people not being able to follow along well due to screen size and large group.  People say they want themed tracks and stuff, but we rely on volunteers giving talks, we aren’t buying these off the shelf somewhere (“Give me 6 Kubernetes talks, 6 DevOps culture talks, 6 DevOps manager talks, and 6 intermediate level technical talks…”)  We are still committed to multiple technical tracks (DoDA was the first DoD to do this, many are still uni-track) because we’re 7 years in and we have a great diversity of experience in our community, and people don’t want to sit through the same messaging again.

Some talks are beloved and others aren’t.  As we sifted through the details, one comment from “What can we do better” on the attendee survey came to me.  “Talks focused on ‘I am a _____, here’s the problem we had and how we solved it.’ I say that because one of the coolest, most useful talks I saw was the Coinbase engineer who described how he used EBS volumes creatively to solve their scaling problem.”

So we decided to retire the voting but heavily curate the talks.  We don’t want “whatever talk you’re giving nowadays on the DevOps talk circuit” – we want talks in that format, the problem you had and how you solved it.

We’re working out the details, but we’re thinking about having these talks be more like 15 minutes long, with then linked openspaces that afternoon for the truly interested to get together and go ‘command line level’ with them.  This also allows for more breaks and collaboration time.

We also decided that idiosyncratic is better.  A couple of the organizers got excited about a sports/fitness theme to align with the stadium; one wants to set up a 5K, one has a wife that does yoga classes and we could have one, we can give fitbits as speaker gifts… While I and the other Agile Admins have been filming lynda.com courses and doing other creative things, the advice we keep getting from producers and directors and content managers is “Use *your* voice.  Do what *you* find interesting and other people will find it interesting.” Andrew Shafer loves running Werewolf games at openspaces at conferences, and people really respond to it! So we’re not going to hesitate to put stuff in we find interesting and we figure that enthusiasm will draw others. Trying to give attendees a “standard conference experience” is severely counterproductive because there’s plenty of regular conferences for people to go to, they get sick of it, and that doesn’t fit the devopsdays ethos in the first place.

Sponsors

I challenged the group.  “Tell me why we should have sponsors at all?  Half our revenue was ticket sales and half was from sponsors.  If we double ticket prices to $400 – still very low for any 2-day conference in the world – we can just not take sponsors at all, done and done. If we needed their money it’d be one thing, but we don’t. Let them spend their ‘limited marketing budget’ on the DoD events that do need it. How do the sponsors contribute to our goal other than with funding?”

The immediate response was that there are a bunch of sponsors who *are* part of the community and interacting with them is important; we have loads of Amazon/Google/Atlassian/Oracle/etc hiring going on here for example, and folks who work for Chef and Salt and Puppet and so on in town… We want those folks to be part of the conversation.  Just not disrupt that conversation.  And, some people pay for those tickets out of pocket so having some money to defray attendee costs is good.

We decided to try something different – we are using the luxury boxes at the stadium more and more; they’re relatively inexpensive and we used them for all the openspaces and such this year.   What if, we said, we intersperse sponsor suites with openspace suites, maybe even have them host some of the openspaces, do their own presentations in there too for whoever’s interested?  This means a limited number of sponsor slots (no more than 10, possibly fewer), but a more premium experience right there where the action is happening. And target Austin-presence companies to let them know about it. They can also then get food/drink catered into their suites to bring people in even more.

Attendees

Keep the headcount low – at least our limit of 400 from this year, if not lower. Consider a ‘two-tier’ ticket price with one price if your company is paying and another if you are; Data Day Austin has used this format to good effect.  Lets the non-backed solo folks in without breaking their bank but lets companies that do send attendees pay a reasonable amount.

Venue

UT Stadium is great, we don’t really see a reason to do all the work to change if we’re not doing booths and we’re going with a suite strategy for sponsors. Plus we have developed great relationships with the venue staff.

Keep refining the AV experience but doing it ourselves – we bought equipment and have a large set of “A/V geeks” so we don’t need to have outside people do it.

Food

Keep with boxed lunches. Austinites have had enough BBQ and tex-mex and this event is primarily for them per our goal. The benefit of fast lunch and snacks was tremendous this year. Could spend more on boxes from premium vendors but keep it boxed.  Maybe do drink service ourselves because we got truly rooked by the UT caterers on it this year.  Though Rich said he found the place the athletes eat and we might be able to get in on that… Keeping it fast, though, one way or the other.

Happy hour

We put a lot of work into this and spend double what the happy hour sponsor gives us each year, and then only half the people come and only half of those say they like it.  This year we had unlimited food and booze at a venue with video games in it for Pete’s sake, I think we’re done chasing the idea of the ultimate happy your. Probably we’ll do more of an onsite short sponsor room crawl at the venue, and then an “after party” we don’t put as much money/work into. “A couple free rounds at Scholtz’, get your own ass there.”

Conclusion

All right, that’s all the plan one dinner could get us.  But in the end, we’re happy with how the event went this year.  We’ll change a couple of the things that didn’t work out – talk voting, no booths – but not back to the old way because we already know that was suboptimal, instead we’ll try more options!  If you don’t have experiments not work out, you’re not being experimental enough, so we embrace that with DevOpsDays Austin.

Let us know your thoughts too!  Who are you, and what do you get or want to get out of DevOpsDays Austin?

Leave a comment

Filed under Cloud, DevOps

Released! SRE, Monitoring and Observability!

Well we haven’t had a lot of spare blogging time but we’ve been busy – the agile admins have been hard at work on some more LinkedIn Learning/lynda.com video courses to help make DevOps more accessible for the common man and woman.

lyndalinkedinJames and I did DevOps Foundations: Site Reliability Engineering, which explains our view of what SRE is, and its position in the DevOps arena. This rounds out our “DevOps 201” series, where we delve down into the three practice areas of DevOpscontinuous delivery, infrastructure as code, and SRE.

Monitoring is a big part of SRE, but too much to do in the scope of all one course – so at the same time, Peco and I filmed a companion course, DevOps Foundations: Monitoring and Observability!  Like the CD and IaC courses, this alternates theory with demos.

Along with the recent DevOps Foundations: Lean and Agile I did with Karthik, we feel like we’ve now completed a curriculum that can introduce you to all the major areas of DevOps and prepare you to grow from there. (Link to lynda.com playlist.) The guys are doing other courses with lynda as well, we’ve kinda gotten addicted to doing them!  You can check them all out by your favorite Agile Admin:

I love talking to the folks I meet at conferences and whatnot who have done the courses; let us know what other areas you’d like to get the Agile Admin learning treatment!

We’re not a consultancy or anything – just four practitioners here in Austin who love giving back to the community when we’re not doing our day jobs. We get some royalties per click from the classes, but other than that we don’t have anything to sell you. We got into this to help people make sense of the confusing DevOps landscape and we’ll keep doing it as long as it seems like it’s meeting that goal, so your feedback is needed to let us know if we should keep going and if so on what.

Leave a comment

Filed under DevOps

Monitoring and Observability

Ah, observability, the new buzzword of the day. Monitoring vendors aplenty are using the word, to basically mean “better monitoring!” You know, #monitoringlove not #monitoringsucks. Because monitoring doesn’t help with debugging and doesn’t have app instrumentation right?

Well, I have to say “bah” to that.  So here’s the thing.  I’m an electrical engineer by education, and I spent a lot of time working at National Instruments, an engineering test and measurement company.  You may be surprised to know these terms have actual definitions that don’t require Twitter arguments to discover.

Monitoring is an activity you perform. It’s simply observing the state of a system over a period of time.

Why do we monitor? For three reasons, in general.

  • Problem Detection – you know, alerting, or seeing issues on dashboards.
  • Problem Resolution – root cause and troubleshooting.
  • Continuous Improvement – capacity planning, financial planning, trending, performance engineering, reporting.

How do we monitor?  Well, that’s called instrumentation. You can instrument your systems and get CPU and stuff, you can use synthetic probes, you can use JavaScript bugs to get end user monitoring, you can emit metrics from applications, you can introspect services and apps via whatever parts are exposed (from JMX to nginx stats to sysdig traces), you can take network traces… (Some folks are similarly trying to redefine “instrumentation” to just mean application instrumentation, which is lame, and in defiance of the fact that application performance management tools that do app instrumentation have existed for decades.)

You can instrument metrics or events; metrics have certain sampling frequency and resolution…

So what is observability?  This isn’t a new term. It comes from system control theory. You know, the stuff that makes your A/C system and electrical plants and your car work.

Observability is a measure of how well the internal states of a system can be inferred from knowledge of its external outputs.

Observability is a property of a system. You can monitor a system using various instrumentation, but if the system doesn’t externalize its state well enough that you can figure out what’s actually going on in there, then you’re stuck.

So is observability hippy bullcrap?  No, of course not. In a DevOps world, it’s very important that the apps and systems concentrate on making themselves both observable and controllable (I leave it to the reader to research controllability, unless I get agitated enough to post about that too). Do you make yourself “easy to monitor”?

Externalizing custom metrics contributes to observability (you know, like with dropwizard metrics).  So does good logging.  So does proper architecture!  Take a system that sticks all kinds of messages into one message queue rather than using separate queues for separate types – the latter is more observable; you can more readily see how many of what is flowing through.  (It’s more controllable too, as you can shut off one queue or another.)

Making your system observable is therefore important, so that if you monitor it with appropriate instrumentation, you understand the state of the system and can make short or long term plans to change it.

While a monitoring tool can definitely contribute to this via its innovation in instrumentation, analysis, and visualization, in large part observability is a battle won or lost before you start sticking tools on top of the system. It’s very important to take it into account when designing and implementing services. No tool is going to “give you” observability and that’s the usual silver bullet fallacy heard from someone who wants to sell you something.

I’m not saying every vendor is using the term wrongly (in fact I just came across this New Relic post that is very well done), but I have to say I am less than impressed when common engineering terms are so widely misused and misunderstood widely in our industry.

Would you like to know more?  Peco and I are working on a new lynda.com course on monitoring and observability!  There’ll be real engineering, a broad canvas of the different kinds of monitoring instrumentation, tips on implementation and use… We’ve both been using and/or building monitoring tools for decades now so we hope to have some useful info for you.

1 Comment

Filed under DevOps, Monitoring