Monthly Archives: October 2010

A DevOps Manifesto

They were talking about a DevOps manifesto at DevOpsDays Hamburg and it got me to thinking, what’s wrong with the existing agile development manifesto?  Can’t we largely uptake that as a unifying guiding principle?

Go read the top level of the Agile Software Development Manifesto.  What does this look like, if you look at it from a systems point of view instead of a pure code developer point of view?  An attempt:

DevOps Manifesto

We are uncovering better ways of running
systems by doing it and helping others do it.
Through this work we have come to value:

Individuals and interactions over processes over tools
Working systems over comprehensive documentation
Customer and developer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

That’s not very different. It seems to me that some of the clauses we can accept without reservation.  As a systems guy I do get nervous about individuals and interactions over processes and tools (does that promote the cowboy mentality?) – but it’s not to say that automation and tooling is bad, in fact it’s necessary (look at agile software development practices, there’s clearly processes and tools) but that the people involved should always be the primary concern.  IMO this top level agile call to arms has nothing to do with dev or ops or biz, it’s a general template for collaboration.  And how “DevOps” is it to have a different rallying point for ops vs dev?  Hint: not very.

Then you have the Twelve Principles of Agile Software. Still very high level, but here’s where I think we start having a lot of more unique concerns outside the existing list.  Let me take a shot:

Principles Behind the DevOps Manifesto

We follow these principles:

Our highest priority is to satisfy the customer
through early and continuous delivery
of valuable functionality. (more general than “software”.)

Software functionality can only be realized by the
customer when it is delivered to them by sound systems.
Nonfunctional requirements are as important as
desired functionality to the user’s outcome. (New: why systems are important.)

Infrastructure is code, and should be developed
and managed as such. (New.)

Welcome changing requirements, even late in
development. Agile processes harness change for
the customer’s competitive advantage. (Identical.)

Deliver working functionality frequently, from a
couple of weeks to a couple of months, with a
preference to the shorter timescale. (software->functionality)

Business people, operations, and developers must work
together daily throughout the project. (Add operations.)

Build projects around motivated individuals.
Give them the environment and support they need,
and trust them to get the job done. (Identical.)

The most efficient and effective method of
conveying information to and within a development
team is face-to-face conversation. (Identical.)

Working software successfully delivered by sound systems
is the primary measure of progress. (Add systems.)

Agile processes promote sustainable development.
The sponsors, developers, operations, and users should be able
to maintain a constant pace indefinitely.  (Add operations.)

Continuous attention to technical excellence
and good design enhances agility. (Identical.)

Simplicity–the art of maximizing the amount
of work not done–is essential. (Identical – KISS principle.)

The best architectures, requirements, and designs
emerge from self-organizing teams. (Identical.)

At regular intervals, the team reflects on how
to become more effective, then tunes and adjusts
its behavior accordingly. (Identical.)

That’s a minimalist set.

Does this sound like it’s putting software first still?  Yes.  That’s desirable.  Systems are there to convey software functionality to a user, they have no value in and of themselves.  I say this as a systems guy.  However, I did change “software” to “functionality” in several places – using systems and stock software (open source, COTS, etc.) you can deliver valuable functionality to a user without your team writing lines of code.

Anyway, I like the notional approach of inserting ourselves into the existing agile process as opposed to some random external “devops manifesto” that ends up begging a lot of the questions the original agile manifesto answers already (like “What about the business?”).  I think one of the main points of DevOps is simply that “Hey, the concepts behind agile are sound, but y’all forgot to include us Ops folks in the collaboration – understandable because 10 years ago most apps were desktop software, but now that many (most?) are services, we’re an integral part of creating and delivering the app.”

Thoughts?

20 Comments

Filed under DevOps

Do You Need Ops?

Ok, so this is old but I hadn’t read it before.  InfoQ hosted  What is the Role of an Operations Team in Software Development Today? The premise: You don’t need ops any more!  DevOps means your developers can do ops.  Ta da.

Well, besides separation of duties problems, this has a number of fundamental flaws with it.  The first is sheer amount of knowledge required and work to do.  One of the greatest difficulties in hiring Web Ops people is getting the wide generalist/specialist skill set that they need – the whole first chapter of Web Operations is Theo Schlossnagle talking about that.  The more skill sets you pack into one person, the less good they are at them.  A good developer needs a huge skill set, so does a good operations person.  If you add “app server administration” to a dev, they’re going to have to “forget” Spring or something to make room, in a virtual sense.  Sure, you can take a developer and teach them ops, it’s not totally foreign – but that’s because all these people come out of the same CS/MIS programs in the first place, duh. You can take a Flash developer and teach them embedded development too, but I think everyone understands what a fundamental retooling that is.

So is this just an idea from someone who has no idea what all Operations folks do?  Maybe.  I know I had one discussion inside our IT department with a development architect who, bridling at our concerns with a portal project, said “What do you people do anyway?  Why do we need your team?  You just move files around all day!”  It’s the classic “I don’t know what all that job entails, so it must be easy” syndrome. But our systems team has a huge amount of institutional knowledge around APM, security, management, etc – heck, we try to spread it into the dev teams as much as we can, but there’s a lot.  It’s similar to QA – sure, “developers can do their own testing” – but doing good load testing etc. is a large field of endeavor unto itself.  If all testing is left to devs, you don’t get good testing.  Doesn’t mean devs shouldn’t test, or write unit tests – they are a necessary but not sufficient part of the testing equation.

But you know, I would argue that from a certain point of view, maybe this is right.

Infrastructure = code, right?  And if you are far down the path of automation and system modeling, then you redefine Ops as just a branch of development.  One guy on the team knows SQL, another knows .NET, and another knows Apache config and Amazon AMI.  One tester knows how to do functional regression tests, another knows how to do load tests, and another knows how to do performance, security, and reliability testing.  Sure, from a certain point of view these are simply all different technical skills in one big bag of skills, so a systems engineer who configures WebLogic is just a developer that knows WebLogic as one of their tools.  And I think there’s a lot of truth to this really, and part of our DevOps implementation here has focused on mainstreaming ops into our same agile tracking tool, bug tracking, processes, etc. as our developers.

However, this misses another huge part of the equation – mixing reactive support and proactive work is a time-killer that causes context switching and thrash that degrades efficiency.

Even in our Web admin team, we separated out into a “systems engineering” group and a “production support” group.  The former worked on projects with developers writing new code, and the latter handled pages, requests, etc. around a running system.  It’s because the interrupt driven work from operations absolutely killed people’s ability to execute on projects.  There’s a great part in the O’Reilly book Time Management for System Administrators that prescribes swapping off ops/support with other admins to reduce that problem.

Many developers don’t understand a running system.  It’s been interesting being in R&D now at NI, where a lot of the development is desktop software driven – for a long time we ran these public facing Web demos where the R&D engineers would say, with a straight face, “You log into Windows, click to run this app, and then lock the screen.”  Even the idea of running as a service was weird hoodoo.

Anyway, in IT here the apps teams have split out as well!  There’s App Ops groups that offload CI and production issues from the “main” App Dev groups; then the systems engineers work more with the main app dev groups and the production support team works with the app ops groups.  And believe me, there’s enough work to go around.

Now, of course you need developers involved in the operational support of your apps.  That’s part of the value of DevOps – it’s not all “Ops needs to learn new stuff,” it’s also “Devs need to be involved in support.”  But in the end, those are huge areas where “do it all” is not meaningful.  Developers helping with production support is a necessary but not sufficient part of operations.

4 Comments

Filed under DevOps

Cloud Security: a chicken in every pot and a DMZ for every service

There are a couple military concepts that have bled into technology and in particular into IT security, a DMZ being one of them. A Demilitarized Zone (DMZ) is a concept where there is established control for what comes in and what goes out between two parties and in military terms you can establish a “line of control” by using a DMZ. In a human-based DMZ, controllers of the DMZ make ingress (incoming) and egress (outgoing) decisions based on an approved list–no one is allowed to pass unless they are on the list and have proper identification and approval.

In the technology world, the same thing is done based on traffic between computers. Decisions to allow or disallow the traffic can be made based on where the traffic came from (origination), where it is going (destination) or even dimensions of the traffic like size, length, or even time of day. The basic idea being that all traffic is analyzed and is either allowed or disallowed based on determined rules and just like in a military DMZ, there is a line of control where only approved traffic is allowed to ingress or egress. In many instances a DMZ will protect you from malicious activity like hackers and viruses but it also protects you from other configuration and developer errors and can guarantee that your production systems are not talking to test or development tiers.

Lets look at a basic web tiered architecture. In a corporation that hosts its own website they will more than likely have the following four components: incoming internet traffic, a web server, a database server and an internal network. To create a DMZ or multiple DMZ instances to handle their web traffic, they would want to make sure that someone from the internet could only talk to the web server, they would also want to verify that only the web server can only talk to the database server and they would want to make sure that their internal network is inaccessible by the the web server, database server and the internet traffic.

Using firewalls, you would need to set up at least the below three firewalls to adequately control the DMZ instances:

1. A firewall between the external internet and the web server
2. A firewall in front of the internal network
3. A firewall between your web servers and database server

Of course these firewalls need to be set so that they allow (or disallow) only certain types of traffic. Only traffic that meets certain rules based on its origination, destination, and its dimensions will be allowed.

Sounds great, right? The problem is that firewalls have become quite complicated and now sometimes aren’t even advertised as firewalls, but instead are billed as a network-device-that-does-anything-you-want-that-can-also-be-a-firewall-too. This is due in part to the hardware getting faster, IT budgets shrinking and scope creep. The “firewall” now handles VPN, traffic acceleration, IDS duties, deep packet inspection and making sure your employees aren’t watching YouTube videos when they should be working. All of those are great, but it causes firewalls to be expensive, fragile and difficult to configure.  And to have the firewall watching all ingress and egress points across your network, you usually have to buy several devices to scatter throughout your topology.

Additionally, another recurring problem is that most firewall analysis and implementation is done with an intent to secure the perimeter. Which make sense, but it often stops there and doesn’t protect the interior parts of your network. Most IT security firms that do consulting and penetration tests don’t generally come through the “front door” by that I mean, they don’t generally try to get in through the front-facing web servers but instead they will go through other channels such as dial-in, wireless, partner, third-party services, social engineering, FTP servers or that demo system that was setup back 5 years ago that no hasn’t been taken down–you know the one I am talking about. Once inside, if there are no well-defined DMZs, then it is pretty much game-over because at that point there are no additional security controls. A DMZ will not fix all your problems, but it will provide an extra layer of protection that could protect you from malicious activity. And like I mentioned earlier it can also help prevent configuration errors crossing from dev to test to prod.

In short, a DMZ is a really good idea and should be implemented for every system that you have. The most optimal DMZ would be a firewall in front of each service that applies rules to determine what traffic is allowed in and out. This, however, is expensive to set up and very rarely gets implemented. That was the old days, this is now and good news, the cloud has an answer.

I am most familiar with Amazon Web Services so we will include an example of how do this with security groups from an AWS perspective. The following code creates a web server group and a database server group and allows the web server to talk to the database on port 3306 only.

ec2-add-group web-server-sec-group -d "this is the group for web servers" #This creates the web server group with a description
ec2-add-group db-server-sec-group -d "this is the group for db server" #This creates the db server group with a description
ec2-authorize web-server-sec-group -P tcp -p 80 -s 0.0.0.0/0 #This allows external internet users to talk to the web servers on port 80 only
ec2-authorize db-server-sec-group --source-group web-server-sec-group --source-group-user AWS_ACCOUNT_NUMBER -P tcp -p 3306 #This allows only traffic from the web server group on port 3306 (mysql) to ingress

Under the above example the database server is in a DMZ and only traffic from the web servers are allowed to ingress into it. Additionally, the web server is in a DMZ in that it is protected from the internet on all other ports except for port 80. If you were to implement this for every role in your system, you would be in effect implementing a DMZ between each layer and would provide excellent security protection.

The cloud seems to get a bad rap in terms of security. But I would counter that in some ways the cloud is more secure since it lets you actually implement a DMZ for each service. Sure, they won’t do deep packet analysis or replace an intrusion detection system, but they will allow you to specifically define what ingresses and egresses are allowed on each instance.  We may not ever get a chicken in every pot, but with the cloud you can now put a DMZ on every service.

Leave a comment

Filed under Cloud, Security

Why SSL Sucks

How do you make your Web site secure?  “SSL!” is the immediate response.  It’s a shame that there are so many pain points surrounding using SSL.

  • It slows things down.  On ni.com we bought a hardware SSL accelerator, but in the cloud we can’t do that.
  • Cert management.  Everyone uses loads of VIPs, so you end up needing to get wildcard certs.  But then you have to share them – we’re a big company, and don’t want to give the main wildcard cert out to multiple teams.  And you can’t get extended validation (EV) on wildcard certs.
  • UI issues.  We just tried turning SSL on for our internal wiki.  But anytime there’s any non-https included element – warnings pop up.  If there’s a form field to go to search, which isn’t https, warnings pop up.  Hell, sometimes you follow a link to a non-https page, and warnings pop up.
  • CAs.  We have an internal NI CA and try to have some NI-signed certs, but of course you have to figure out how to get them into every browser in your enterprise.
  • It’s just retardedly complicated.  For putting a cert on Apache, it’s pretty well-worn, but recently I was trying to set up encrypted replication for mySQL and OpenDS and Jesus, doing anything other than the default self signed cert is hell on earth.  “Oh is that in the right format wallet?”

The result is that SSL’s suckiness ends up driving behavior that degrades security.  People know to just “accept the exception” any time they hit a site that complains about an invalid cert.  We have decided to remove SSL from most of our internal wiki and just leave it on the login page to avoid all the UI issues.  We couldn’t secure our replication from a combination of bugs (OpenDS secure replication works, until you restart any server that is – then it’s broken permanently) and the hassle.

In general, there has been little to no usability work put into the cert/encryption area, and that is why still so few people use it.  PGPing your email is only for gearheads. Hell, you have to transform key formats to use PuTTY to log into Amazon servers using SSL.  Stop the madness.

If the world really gave a crap about encryption, then e.g. your public key could be attached to your Facebook profile and people’s mail readers could pull that in automatically to validate signatures, for instance. “Key exchange” isn’t harder than the other kinds of more-convenient information exchange that happen all the time on the Net.   And you could take a standard cert in whatever format your CA gives it to you and feed it into any software in an easy and standard way and have it start encrypting its communication.

Me to world – make it happen!

7 Comments

Filed under Security

What’s a “DevOp?”

I ran across an interesting post by Dmitriy Samovskiy about the difference between a DevOp and a Sysadmin and it raised up some thoughts I’ve had about the classification of different kinds of sysadmin types and the confusion that the “Ops” part of “DevOps” sometimes causes.

I think that using “DevOp” as a job or role name isn’t a good idea, and that really what the term indicates is that there are two major classes of technical role,

  • Devs – people who work with code mostly
  • Ops – people who work with systems mostly

You could say a “DevOp” is someone who does some of both, but I think the preferred usage is that DevOps, like agile, is about a methodology of collaboration.  A given person having both skill sets of fulfilling both roles doesn’t require a special term, it’s like anyone else in IT with multiple hats.

Of course, inside each of these two areas is a wide variety of skills and specialized roles.  Many of the people talking about “DevOps” are in five-person Web shops, in which case “Ops” is an adequate descriptor of “all the infrastructure crap guy #5 does.”

But in larger shops, you start to realize how many different roles there are.  In the dev world, you get specialization, from UI developers to service developers to embedded developers to algorithm developers.  I’d tend to say that even Web design/development (HTML/CSS/JS) and QA are often considered part of the “dev side of the house.”  It’s the same in Ops.

Now, traditionally many “systems” teams, also known as “infrastructure” teams, have been divided up by technology silo only.  You have a list of teams of types UNIX, Windows, storage, network, database, security, etc.  This approach has its strengths but also has critical weaknesses – which is why ITIL, for example, has been urging people to reorganize around “services you are delivering” lines.

In the dev world, you don’t usually see tech silos like that.  “Here’s the C programmer department, the Java programmer department, the SQL programmer department…  Hand your specs to all those departments and hope you get a working app out of it!”  No, everyone knows intuitively that’s insane.  But largely we still do the same thing in traditional systems teams (“Ops” umbrella).

So historically, the first solution that emerged was a separate kind of group.  Here at NI, the first was a “Web Ops” group called the Web Admins, which was formed ten years ago when it became clear that running a successful Web site cannot be done by bringing together fractional effort from various tech silos.  The Web Admins work with the developers and the other systems teams – the systems teams do OS builds, networking, rack-and-jack, storage/data center, etc. and the Web Admins do the software (app servers, collab systems, search, content management, etc.), SaaS, load balancing, operational support, release management, etc.  Our Web Admin team ended up expanding very strongly into the application performance management and Web security areas because no one else was filling them.

In more dotcommey companies, you see the split between their “IT group” and their “Engineering” or “Operations” group that is “support for their products,” as two entirely different beasts.

Anyway, the success of this team spawned others, so now there are several teams we call “App Admins” here at NI, that perform this same role with respect to sitting between the developers and the “system admins.”  To make it more complicated, even some of the apps (“Dev”) teams are also spawning “App Ops” teams that handle CI work and production issue escalation, freeing up the core dev teams for more large-scale projects.  Our dev teams are organized around line of business (ecommerce, community, support, etc.) so they find that helpful. (I’ll note that the interface between line of business organization and technology silo organization is not an easy one.)

Which of these teams are the “DevOps?”  None of them.  Naturally, the teams that are more in the middle feel the need for it more, which is why I as a previous manager of the Web Admins am the primary evangelist for DevOps in our organization.  The “App Admins” and the new “App Ops” teams work a lot more closely together on “operational” issues.

But this is where the term “Ops” has bad connotations – in my mind, “operations”, as closely related to “support”, is about the recurring activities around the runtime operation of our systems and apps.  In fact, we split the Web Admin team into two sub-teams – an “operations” team handling requests, monitoring, releases, and other interrupt driven activity, and a “systems” team that does systems engineering.  The interface between systems engineering and core dev teams is just as important as the interface around runtime, even more so I would say, and is where a lot of the agile development/agile infrastructure methodology bears the most fruit.  Our system engineering team is involved in projects alongside the developers from their initiation, and influence the overall design of the app/system (side note, I wish there was a word that captured “both app and system” well; when you say system people sometimes take that to mean both and sometimes to just mean the infrastructure).  And *that’s* DevOps.

Heck, our DBA team is split up even more – at one point they had a “production support” team, a “release” team, an “architecture” team, and a “projects” team.

But even on the back end systems teams, there are those that have more of a culture of collaboration – “DevOps” you might call it – and they are more of a pleasure to interface with, and then there’s those who are not, who focus on process over people, you might say.  I am down with the “DevOps” term just because it has the branding buzz around it, but I think it really is just a sexier way to say “Agile systems administration.”

On a related note, I’ve started to see job postings go by for “DevOps Engineers” and other such.  I think that’s OK to some degree, because it does differentiate the likely kind of operating environment of those jobs from all the noise posted as “UNIX Engineer III”, but if you are using “DevOps” as a job description you need to be pretty clear in your posting what you mean in terms of exact skills because of this confusion.  Do you mean you just want a jack of all trades who can write Java/C# code as well as do your sysadmin work because you’re cheap?  Or do you want a sysadmin who can script and automate stuff? Or do you want someone who will be embedded on project teams and understand their business requirements and help them to accomplish them?  Those are all different things that have different skill sets behind them.

What do you think?  It seems to me we don’t really have a good understanding of the taxonomy of the different kinds of roles within Ops, and thus that confuses the discussion of DevOps significantly.  Is it a name for, or a description of, or a prescription for, some specific sub-team?  Which is it for – production support, systems engineering, does IT count or is it just “product” support orgs for SaaS?

5 Comments

Filed under DevOps

Innotech Austin Coming Up Oct 28!

The main Austin technology conference, Innotech, is October the 28th!  it’s a good opportunity to see who’s up to what in the Austin area.  And cloud is hot on the list of sessions.  We’re going, who else is?

Leave a comment

Filed under Conferences