Here’s a DevOps 101 presentation based on the definition of DevOps here at The Agile Admin I’m delivering at Innotech San Antonio tomorrow as part of a devops.com attempt to spread DevOps learning to IT and the enterprise. (You probably want to go view it on slideshare.com so you can read the notes, too…)
Tag Archives: conference
@bridgetkromhout spoke on “how I learned to stop worrying and love devops” and @benzobot spoke on onboarding and mentoring apprentices. DoD SV certainly made a strong effort to get more female speakers this year! We tried in Austin (I personally wrote like every local techie woman group I could find) but we only had like one.
Then there were two super bad ass presentations back to back. I can’t find the slides online yet.
The Future of Configuration Management
Mark Burgess (@markburgess_osl), aka “The CFEngine Guy” and noted Promise Theory advocate, spoke. Chef and Puppet had eclipsed CFEngine for a while but it turns out as the Internet of Things and containers and stuff are arriving that maybe many of his design decisions were actually prescient and not retro. Here it is broken down into wise sayings.
- Why do we not have CAD for IT systems?
- Orchestration is not bricklaying.
- We need the equivalent of style sheets for servers.
- We are entering a world of decentralized smart infrastructure.
- Scale, complexity, and knowledge increase as our desire for flexibility increases.
- Separation of concerns adds complexity and fragility.
- To handle complexity – atomize and untether.
- 3D printed datacenters are coming.
DevOps as Relationship Management
James Urquhart (@jamesurquhart) spoke about the interconnectedness of our systems. The SEC, post flash crash, added circuit breakers, defined rollback protocols, inserted agents into the flow of the stock exchange trading systems to prevent uncontrolled cascading.
One simple rule – visualize the whole system (monitor your outside relationships) but take action at the agent level. “How are you doing today?” “Good.” Monitoring is going well, new approaches in the space look at policies and interactions and performance and business medtrics – but need to differentiate reductionist vs expansionist approaches.
Michal Nygard’s book Release It! is full of great patterns, and Netflix’ open sourced Hystrix is an example of the kind of relational system safeguards you can build off it.
- Tips for Introverts (at Conventions) by Tom Duffield – They include find a role, don’t fear failure, attend preconference activities, go to lunch early and sit, engage, share interests, find a comfortable setting, take time to recharge. As someone initially introverted myself (no one believes that now) I like that this has actual tips to get past it; in some circles “introversion” has become the new “Asperger’s” as a blanket excuse for not wanting to bother to relate to people.
- Mike Place on scalable container management – Google kubernetes is an example. Don’t just provision your systems, you need to manage them too. Images came and went and came back now, but you also can’t ignore what’s onboard the image. It’s time to join image and config management.
This was really good and the world should listen. On the one hand, conducting CM operations on 1000 servers in parallel is contributing unnecessarily to the heat death of the universe. On the other hand, you need to build those images in a non-manual way in the first place! And too many systems worry about the configuration but not the runtime operation. Amen brother!
- Finally (well, there were two more, but I didn’t care for them so took no notes), John Willis (@botchagalupe) did [Darwin to] Deming to DevOps, a burst-fire reading list of nondeterminism tracing from Darwin through various scientists to the Deming/TPS stuff through into the DevOps world with Gene Kim and Patrick Debois. It was pimp. Here it is when he gave it at another venue:
Here’s some big themes from the week.
- Deterministic, reductionist, and centralized are for suckers.
- Complexity is the enemy. Systems thinking is necessary.
- We love continuous deployment. But DevOps is not just about delivering code to production.
- Women exist in DevOps and are cool. More would be great.
- Most vendors have figured out to just relax and talk to techies in a way they might listen to. Some haven’t.
It was a great event, kudos to Marius and the other organizers who put in a lot of work to wrangle 500 people, nearly 30 sponsors, food, venue, and the like. If you haven’t been to a DevOpsDays, look around, there may be one near you! I help organize DevOpsDays Austin (just had our third annual) and there’s ones coming this year from Tel Aviv to Minneapolis.
If you went to DoD SV, feel free and comment below with your thoughts (linking any posts you’ve made, slides, etc. is welcome too)!
I have more notes from Velocity, but thought I’d do DevOpsDays first while it’s freshest in my brain. This isn’t a complete report, it’s just my thoughts on the parts I felt moved to actually write down or gave me a notable thought. More notes when I was learning, less when I wasn’t (not a reflection on the quality of the talk, just some things I already knew a bit about).
DevOpsDays Silicon Valley 2014 was June 27-28 at the San Jose Computer Museum. 500 people registered; not sure how many showed but I’d guess definitely in excess of 400.
State of the Union
First we had John Willis (@botchagalupe) giving the DevOps State of the Union. Here’s the slides (I know it says Amsterdam, he gave it there too.) This consisted of two parts – the first was a review of Gene Kim et al’s 2014 State of DevOps Report – go download it if you haven’t read it, it’s great stuff.
The second part is about how we are moving towards software defined everything – robust API driven abstractions decoupled from the underlying infrastructure. John’s really into software defined networking right now as it’s one of the remaining strongholds of static-suckiness in most infrastructures. A shout out to the blog at networkstatic.net and tools like mesos and Google’s kubernetes that are making computing even more fluid (see this article for some basics). “Consumable, composable infrastructure.”
Next, our favorite Kanban expert Dominica DeGrandis (@dominicad) spoke on “Why Don’t We Just Say No?” Here’s the slides. As a new product manager, and as a former engineering manager who had engineers that would just take on work till they burst even with me standing there yelling “No! Don’t do it!”, it’s an interesting topic.
Why do you take on more work than you have capacity to do? She cites The Book Of No by Susan Newman, Ph. D and a very recent Psychology Today “Caveman Logic” post called Why So Many People Just Can’t Say No. She proposes that it is easier for devs to say no; ops have more pressing demands and are forced into too much yes. Some devs took exception to this on twitter – “our product people make us do all kinds of stuff we don’t like to” – but I think that’s different from the main point here. It’s not that “you have to do something you don’t like and are overruled when you say no” but that “you become severely over-committed due to requests from many quarters and being unwilling to say no.”
She goes through a great case study of changing over a big ops shop to a more modern “SRE” model and handle both interrupt and project work by getting metrics, having a lower WIP limit, closing out >90 day old tickets, and saying no to non-emergency last minute requests. In fact, the latter is why I prefer scrum over kanban for operations so far – she contended that devs have an easier time saying no to interrupt work because of the sprint cadence. OK, so adopt a sprint cadence! Anyway, by having some clear definitions of done for workflow stages they managed to improve the state of things considerably. Use kaizen. The book about the Pixar story, Creativity Inc., talks about how the Pixar folks were running themselves ragged to try to finish Toy Story, till someone left their baby in a car because they were too frazzled. “Asking this much of people, even when they wanted to give it, was not acceptable.” What should your WIP level be? The level of “personal safety” would be a great start!
It’s interesting – I did some of these things at Bazaarvoice and tried to do some other ones too. But often times the resistance would be from the engineers that the current process was working to death. “We can’t close those old tickets! They have valuable info and analysis and it’s something that needs to happen!” “Yes, but our rate of work done and rate of work intake proves mathematically that they’ll never get done. Keeping them open is therefore us making a false promise to whoever logged those tickets.” Not everyone is able to ruthlessly apply logic to problems – you’d think that would be an engineer attribute but in my experience, not really any more than the general population. But given that “not acceptable” quote above, I really struggled with how to get engineers who were burning themselves out to quit it. It’s harder than you’d think.
Agile at Scale
Next was a fascinating case study from Capital One’s transformation to an agile, BDD, devops-driven environment given by Adam Auerbach (@bugman31). The slides are available on Slideshare. They used the Scaled Agile Framework (SAFe) and BDD/acceptance test driven development with cucumber as well as continuous integration. In a later openspace there were people from Amex, city/state/federal governments, etc. trying to do the same thing – Agile and DevOps aren’t just for the little startups any more! He reported that it really improved their quality. Hmm, from the Googles it looks like the consulting firm LitheSpeed was involved, I met one of their principals at Agile Austin and he really impressed me.
Sales and Marketing Too
Sarah Goff-DuPont (@devtoolsuperfan) spoke about having sales and marketing join the agile teams as well. Some tips included cross-pollinating metrics and joining forces on customer outreach.
Just some quick thoughts from the day one Ignites.
- @eriksowa on OODA and front end ops and screaming at your team in German (I am in favor of it)
- Aater (futurechips) on data acquisition and multitenancy with docker
- Jason Walker on LegoOps
- Ho Ming Li on Introducing DevOps
- @seemaj from Enstratius on classic to continuous delivery – slides. Pretty meaty with lots of tool shout-outs – grunt, bowler, angular, yo, bootstrap, grails, chef, rundeck, hubot, etc. I don’t mind a good laundry list of things to go find out more about!
- Matt Ho on Docker+serf – with Docker there is a service lookup challenge. AWS tagging is a nice solution to that. Serf does that with docker like a peer-to-peer zookeeper. Then he used moustache to generate configs. This is worth looking at – I am a big fan of this approach (we did it ourselves at National Instruments years ago) and I frankly think it’s a crime that the rest of the industry hasn’t woken up to it yet.
If you haven’t done openspaces before, it’s where attendees pitch topics and the group self-organizes into a schedule around them. Here’s some pics of part of the resulting schedule:
I went to two. The first was a combination of two openspace pitches, “Enterprise DevOps” and “ITIL, what should it be?” This was unfortunately a bad combination. Most folks wanted to talk about the former, and the Capital One guy was there and people from Amex etc. were starting to share with the group. But the ITIL question was mostly driven by a guy from the company that “bought ITIL” from the UK government and he had a bit of a vendory agenda to push. So most of the good discussion there happened between smaller groups after it broke up.
The second was a CI/CD pipeline one, and I got this great pic of what people consider to be “the new standard” pipeline.
Next, Day 2 and wrap-up!
The third annual DevOpsDays conference in Austin will be May 5-6 (Cinco de Mayo!) at the Marchesa, where it was held last year! As many of you know, the DevOpsDays conferences are a super popular format – half talks from practitioners, half openspaces, all fun – held in many cities around the world since the first one in Ghent launched the DevOps movement proper.
- You can register – all the early bird tickets are sold out but the regular ones are only half gone.
- You can also propose a talk! There’s 35-minute full talk slots but we’re even more in need of 5-minute Ignite! style lightning talks! RFP ends 3/26 sp
- You can sponsor! The Gold sponsorships are half gone already. And we have some special options this year…
DevOpsDays Austin has been bigger and better every year since its inception and should have something good for everyone this year. Come out and join your comrades from the trenches who are trying to forge a new way of delivering and maintaining software!
One of the interesting sessions at ReInvent was a fireside chat with Werner Vogels., where CEO’s or CTO’s of different companies/startups who use AWS talked about their applications/platforms and what they liked and wanted form AWS. It was a 3 part series with different folks, and I was able to attend the 1st one, but I’m guessing videos are available for the others online. Interesting session, giving the audience a window into the way C level people think about problems and solutions…
First up, the CTO of mongodb…
Lots of people use mongo to store things like user profiles etc for their applications. Mongo performance has gotten a lot better because of ssd’s
Recently funded 150 million, and wanting to build out a lot of tools to be able to administer mongo better.
Apparently being a mongodb dba is a really high paying job these days!
User roles may be available in mongo next year to add more security.
Werner and Eliot want to work together to bring a hosted version of mongo like RDS.
Next up twilio’s Jeff Lawson
Jeff is ex amazon.
Software people want building blocks and not some crazy monolithic thing to solve a problem. Telecom had this issue, and that is why I started Twilio.
Everyone is agile! We don’t have answers up front, but we figure out these answers as we go.
Started with voice, then moved to SMS followed by a global presence. Most customers of ours wanted something that didn’t want boundaries and just wanted an API to communicate with their customers.
Werner: It’s hard to run an API business. Tell us more…
Lawson: It is really hard. Apis are kinda like webapps when it comes to scaling. REST helps a lot from this perspective. Multi tenancy issues gets amplified when you have an API business.
Twilio apparently deploys 20 times a day. Aws really helps with deployment because you can bring brand new environments that look exactly like prod and then tear it down when things aren’t needed.
When it comes to api’s, we write the documentation first and show our customers first before actually implementing the API. Then iterate iterate iterate on the development.
Jeff asks: Make it easier to make vpc up and running.
Next up: Valentino with adroll (realtime bidding)
There’s a data collection pipe which gets like 20 tb of data everyday.
Latency is king: Typically latency is like 50ms and 100ms. This is still a lot for us. I wish we had more transparency when it comes to latency inside aws and otherwise…
Why dynamo db? Didn’t find something simple at the time, and it was nice to be able to scale something without having to worry about it. We had 0 ops people at the time to work on scaling at the time.
Read write rates: 80k reads per second (not consistent), 40k writes per second.
Why erlang? You’re a python god.
I started working on Python with the twisted framework. But I realized that Python didn’t fit our use case well; the twisted system worked just as well but it would be complicated to manage it and needed a bit of hacks..
Today it would be hard to pick between erlang and go….
I didn’t cover the day 1 keynote, but fortunately it can be found here. The day 2 keynote was a lot more technical and interesting though. Here are my notes from it:
First, we began by talking about how aws plans its projects.
Before any project is started, and teams are in the brainstorming phase. A few key things are always done.
- Meeting minutes
- Figure out the ux
- Before any code is written
“2 Pizza Teams”: Small autonomous teams that had roadmap ownership with decoupled lauch schedules.
Get the functionality in the hands of customers as soon as possible. It may be feature limited, but it’s in the hands of customers so that they can get feedback as soon as possible. Iterate iterate iterate based on feedback. Different from the old guard where everything is engineering driven and it is unnecessarily complex.
Netflix is on stage and we’re taking about the Netflix cloud prizes and talking about the enhancements to the different tools…looks pretty cool, and will need to check them out. There are 14 chaos monkey “tests” to run now instead of just 1 from before.
Werner is back is breaking down the different facets that AWS focuses on:
- Performance- measure everything; put performance data in log files that can be mined.
Illya sukhar CEO from Parse is on stage now (platform for mobile apps)
-parse data: store data; it’s 5 lines of code instead of a bunch of code.
Parse started with 1 aws instance
From 0-180,000 apps
180,000 collections in mongodb; shows differences between pre and post piops
IAM and IAM roles to set boundaries on who can access what.
How to do this from a db perspective?
Apparently you can have fine grained access controls on dynamodb instead of writing your own code.
Each data block is encrypted in redshift
Talking about how customers are using the spot instances to save $.
We transfer usecase, who take care of transferring large files.
Airbnb on stage with mike curtis, VP of engineering
-350k hosts around the world
-4 millions guests (jan 2013)
-9 million guests today.
Host of aws services
1k ec2 instances
Million RDS rows
50tb for photos in s3
“The ops team at Airbnb is with a 5 person ops team.”
Helps devote resources to the real problem.
Dropcam came on stage after that to talk about how they use the AWS platform. Nothing too crazy, but interestingly more inbound videos are sent to dropcam than YouTube!
They keynote ended with an Amazon Kinesis demo (and a deadmau5 announcement for the replay party), which on the outside looks like a streaming API and different ways to process data on the backend. A prototype of streaming data from twitter and performing analytics was shown to demonstrate the service.
- RDS for PostgreSQL
- New instance types-i2 for much better io performance
- Dynamo db- global secondary indexes!!
- Federation with saml 2.0 for IAM
- Amazon RDS- cross region read replicas!
- G2 instances for media and video intensive application
- C3 instances are new with fastest processors- 2.8 gig intel e5 v2
- Amazon kinesis- real time processing, fully managed. It looks like this will help you solve issues of scalability when you’re trying to build realtime streaming applications. It integrates with storage and processing services.
Incase you want to watch it, the day 2 keynote is here: http://www.youtube.com/watch?v=Waq8Y6s1Cjs
And also, the day 1 keynote: http://www.youtube.com/watch?v=8ISQbdZ7WWc
This was the first talk by @simon_elisha I went to at ReInvent, and was a packed room. It was targeted towards developers going from inception of an app to growing it to 10 million users. Following are the notes I took…
– We will need a bigger box is the first issue, when you start seeing traffic to an application. Single box is an anti pattern because of no failover etc. move out your db from the web server etc…you could use RDS or something too.
– SQL or NoSQL?
Not a binary decision; maybe use both? A blended approach can reduce technical debt. Maybe just start with SQL because it’s familiar and there are clear patterns for scalability. Nosql is great for super low latency apps, metadata data sets, fast lookups and rapid ingesting data.
So for 100 users…
You can get by using route53, ELB, multiple web instances.
For 10000 users…
– Use cloud front to cache any static assets.
– Get your session state out of the webservers. Session state could be stored in dynamo db because it’s just unrelated data.
– Also might be time for elastic cache now which is just hosted redis or memcached.
Min, max servers running in multiple az zones. AWS makes this really simple.
If you end up at the 500k users situation you probably really want:
– metrics and alarms
– automated builds and deploys
– centralized logging
must haves for log metrics to collect:
– host level metrics
– aggregate level metrics
– log analysis
– external site performance
Use a product for this, because there are plenty available, and you can focus on what you’re really trying to accomplish.
Create tools to automate so you save your time especially to manage your time. Some of the ones that you can use are: elastic beanstalk, aws opsworks more for developers and cloud formation and raw ec2 for ops. The key is to be able to repeat those deploys quickly. You probably will need to use puppet and chef to manage the actual ec2 instances..
Now you probably need to redesign your app when you’re at the million user mark. Think about using a service oriented architecture. Loose coupling for the win instead of tight coupling. You can probably put a queue between 2 pieces
Key tip: don’t reinvent the wheel.
Example of what to do when you have a user uploading a picture to a site.
Simple workflow service
– workers and deciders: provides orchestration for your code.
When your data tier starts to break down 5-10 mill users
Split by function or purpose
Gotcha- You will have issues with join queries
This works well for one table with billions of rows.
Gotcha- operationally confusing to manage
– shift to nosql
Sorta similar to federation
Gotcha- crazy architecture change. Use dynamo db.