Tag Archives: Management

Log Management Tools

We’re researching all kinds of tools as we set up our new cloud environment, I figure I may as well share for the benefit of the public…

Most recently, we’re looking at log management.  That is, a tool to aggregate and analyze your log files from across your infrastructure.  We love Splunk and it’s been our tool of choice in the past, but it has two major drawbacks.  One, it’s  quite expensive. In our new environment where we’re using a lot of open source and other new-format vendors, Splunk is a comparatively big line item for a comparatively small part of an overall systems management portfolio.

Two, which is somewhat related, it’s licensed by amount of logs it processes per day.  Which is a problem because when something goes wrong in our systems, it tends to cause logging levels to spike up.  In our old environment, we keep having to play this game where an app will get rolled to production with debug on (accidentally or deliberately) or just be logging too much or be having a problem causing it to log too much, and then we have to blacklist it in Splunk so it doesn’t run us over our license and cause the whole damn installation to shut off.  It took an annoying amount of micromanagement for this reason.

Other than that, Splunk is the gold standard; it pulls anything in, graphs it, has Google-like search, dashboards, reports, alerts, and even crazier capabilities.

Now on the “low end” there are really simple log watchers like swatch or logwatch.  But we’d really like something that will aggregate ALL our logs (not just syslog stuff using syslog-ng – app server logs, application logs, etc.), ideally from both UNIX and Windows systems, and make them usefully searchable.  Trying to make everything and everyone log using syslog is an ever receding goal.  It’s a fool’s paradise.

There’s the big appliance vendors on the “high end” like LogLogic and LogRhythm, but we looked at them when we looked at Splunk and they are not only expensive but also seem to be “write only solutions” – they aggregate your logs to meet compliance requirements, and do some limited pattern matching, but they don’t put your logs to work to help you in your actual work of application administration the dozen ways Splunk does.  At best they are “SIEM”s – security information and event managers – that alert on naughty intruders.  But with Splunk I can do everything from generate a report of 404s to send to our designers to fix their bad links/missing images to graph site traffic to make dashboards for specific applications for their developers to review.  Plus, as we’re doing this in the cloud, appliances need not apply.  (Ooo, that’s a catchy phrase, I’ll have to use that for a separate post!)

I came across three other tools that seem promising:

  • Logscape from Liquidlabs – does graphing and dashboards like Splunk does.  And “live tail” – Splunk mysteriously took this out when they revved from version 3 to 4!  Internet rumor is that it’s a lot cheaper.  Seems like a smaller, less expensive Splunk, which is a nice thing to be, all considered.
  • Octopussy – open source and Perl based (might work on Windows but I wouldn’t put money on it).  Does alerting and reporting.  Much more basic, but you can’t beat the price.  Don’t think it’ll meet our needs though.
  • Xpolog – seems nice and kinda like Splunk.  Most of the info I can find on it, though, are “What about xpolog, is good!” comments appended to every forum thread/blog post about Splunk I can find, which is usually a warning sign – that kind of guerrilla marketing gets old quick IMO.  One article mentions looking into it and finding it more expensive but with some nice features like autodiscovery, but not as open as Splunk.

Anyone have anything to add?  Used any of these?  We’ve gotten kind of addicted to having our logs be immediately accessible, converted into metrics, etc.  I probably wouldn’t even begrudge Splunk the money if it weren’t for all the micromanagement you have to put into running it.  It’s like telling the fire department “you’re licensed for a maximum of three fires at a time” – it verges on irresponsible.


Filed under DevOps

How to hire an Agile Admin

The Kitchen Soap blog has some great interview questions for hiring a WebOps position. Check it out, it is worth the read.

In my experience with hiring, a real simple one is to ask (while holding their resume), “Can you tell me about yourself?” Sure, I can read what it says, but letting them verbalize usually is a good indicator. In one of the last sets of interviews I did I asked a candidate this question and I got a gruff response, “It is all right there, what do you need to know?” Good communication skills? No, see you later.

One other question I like is, “What are two character flaws you have?” Usually someone prepares for one in advance with something like, “I am an over-committed worker…” or other statement that is meant to actually show a positive side about them. Asking for two lets you watch for quick thinking and (again) communication skills. In our industry technical is a must, but people can be trained. If you are bad at communicating or just a jerk, then no amount of training can help.

Anyone else have some good interview Q’s?


Filed under General

Book Review: Smart & Gets Things Done, by Joel Spolsky

Joel Spolsky is a bit of an internet cause célèbre, the founder of Fog Creek Software and writer of joelonsoftware.com, an influential programming Web site.

The book is about technical recruiting and retention, and even though it’s a small format, under 200 page book, it covers a lot of different topics.  His focus is on hiring programmers but I think a lot of the same principles apply to hiring for systems admin/Web systems positions.  Hiring has been one of the hardest parts of being a Web systems manager, so I got a lot out of the book and tried putting it into practice.  Results detailed below!

The Book

The first chapter talks about the relative effectiveness of programmers.  We often hire programmers and pay the good ones 10% more than the bad ones.  But he has actual data, drawn from a Yale professor who repeatedly teaches the same CS class and assigns the same projects, which shows something that those of us who have been in the field for a long time know – which is that the gap in achievement between the best programmers and the worst ones is a factor of ten.  That’s right.  In a highly controlled environment, the best programmers completed projects 3-4 times faster than the average and 10x faster than the slowest ones.  (And this same relationship holds when adjusting for quality of results.)  I’ve been in IT for 15 years and I can guarantee this is true.  You can give the same programming task to a bunch of different programmers and get results from “Here, I did it last night” to “Oh, that’ll take three months.”  He goes on to note other ways in which you can get 10 mediocre programmers that cannot achieve the same “high notes” as one good programmer.  This goes to reinforce how important the programmer, as human capital, is to an organization.

Next, he delves into how you find good developers.  Unfortunately, the easy answers don’t work.  Posting on monster.com or craigslist gets lots of hits but few keeps.  Employee referrals don’t always get the best people either.  How do you find people, then?  He has three suggestions.

  1. Go to the mountain
  2. Internships
  3. Build your own community

“Go to the mountain” means to figure out where the smart people are that you want to hire, and go hang out there.  Conferences.  Organizations.  Web sites.  General job sites are zoos, you need venues that are more specifically spot on.  Want a security guy?  Post on OWASP or ISSA forums, not monster.com.

We do pretty well with internships, even enhancing that with company sponsored student sourcing/class projects and a large campus recruiting program.  He has some good sub-points however – like make your offers early.  If you liked them as an intern, offer them a full-time job at that point for when they graduate, don’t wait.  Waiting puts you into more of a competitive situation.  And interns should be paid, given great work to do, and courted for the perm job during the internship.

Building a community – he acknowledges that’s hard.  Our company has external communities but not really for IT.  For a lot of positions we should be on our our forums like fricking scavengers trying to hire people that post there.

Continue reading


Filed under General