Why A HTTP Sniffer Is Awesome

While looking at Petit for my post on log management tools, I was thrilled to see it link to a sniffer that generates Web type logs called Justniffer.  Why, you might ask, isn’t that a pretty fringe thing?  Well settle in while I tell you why it’s bad ass.

We used to run a Web analytics product here called NetGenesis.  Like all very old Web analytics products, it relied on you to gather together all your log files for it to parse, resulting in error prone nightly cronjob kinds of nonsense.  So they came out with a network sniffer that logged into Apache format, like this does apparently.  It worked great and got the info in realtime (as long as the network admins didn’t mess up our network taps, which did happen from time to time).

I quickly realized this sniffer was way better than log aggregation, especially because my environment had all kinds of weird crap like Domino Web servers and IIS5 that don’t log in a civilized manner.  And since it sat between the Web servers and the client, it could log “client time,” “server time”, and had a special “900” error code for client aborts/timeouts.  I self-implemented what would be a predecessor to todays’ RUM tools like Tealeaf and Coradiant on it.  We used it to do realtime traffic analysis, cross-site reporting, and even used it for load testing as we’d transform and replay the captured logs against test servers. Using it also helped us understand the value of the Steve Souders front end performance stuff when he came around.

Eventually our BI folks moved to a Javascript page tag based system, which are the modern preference in Web analytics systems.  Besides the fact that these schemes only get pages that can execute JS and not all the images and other assets, we discovered that they were reasonably flawed and were losing about 10% of the traffic that we were seeing in the network sniffer log.  After a long and painful couple months, we determined that the lost traffic was from no known source and happened with other page tag based systems (Google Analytics, etc.), not just this supplier’s tool, and the BI folks finally just said “Well…  It gives us pretty clickstreams and stuff, let’s go ahead with it.”  Sadly that sunset our use of the Netgenesis network sniffer and there wasn’t another like it in the open source realm (I looked).  Eventually we bought a Coradiant to do RUM (the sales rep kept trying to explain this “new network RUM concept” to us and kept being taken aback and how advanced the questions were we asked) but I missed the accessibility of my sniffer log…  Big log aggregators like Splunk help fill that gap somewhat but sometimes you really want to grep|cut|sort|uniq the raw stuff.

On the related topic of log replayers, we have really wanted one for a long time.  No one has anything decent.  We’ve bugged every supplier that we deal with on any related product, from RUM to load testing to whatever.  Recording a specific transaction and using that is fine, but nothing compares to the demented diversity of real Internet traffic.  We wrote a custom replayer for our sniffer log, although it didn’t do POST (didn’t capture payloads – looks like justniffer can though!) and got a lot of mileage out of it.  Found al ot of app bugs before going to production with that baby.  Anyway, none of the suppliers can figure it out (Oracle just put together a DB traffic version of this in their new version 12 though).  Now that there’s a sniffer we can use, we already have a decent replayer, we’re back in business!  So I’m excited, it’s a blast from the past but also one of those core little things that you can’t believe there isn’t one of, and that empowers someone to do a whole lot of cool stuff.

Leave a comment

Filed under DevOps

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.