The 1.0 release of Google Chrome has everyone abuzz. Here at NI, loads of people are adopting it. Shortly after it went gold, we started to hear from users that they were having problems with our internal collaboration solution, based on the Atlassian Confluence wiki product. They’d hit a page and get a terse error, which if you clicked on “More Details” you got the slightly more helpful, or at least Googleable, string “Error 320 (net::ERR_INVALID_RESPONSE): Unknown error.”
At first, it seemed like if people reloaded or cleared cache the problem went away. It turned out this wasn’t true – we have two load balanced servers in a cluster serving this site. One server worked in Chrome and the other didn’t; reloading or otherwise breaking persistence just got you the working server for a time. But both servers worked perfectly in IE and Firefox (every version we have lying around).
So we started researching. Both servers were as identical as we could make them. Was it a Confluence bug? No, we have phpBB on both servers and it showed the same behavior – so it looked like an Apache level problem.
Sure enough, I looked in the logs. The error didn’t generate an Apache error, it was still considered a 200 OK response, but when I compared the log strings the box that Chrome was erroring on showed that the cookie wasn’t being passed up; that field was blank (it was populated with the cookie value on the other box and on both boxes when hit in IE/Firefox). Both boxes had an identically compiled Apache 2.0.61. I diffed all the config files- except for boxname and IP, no difference. The problem persisted for more than a week.
We did a graceful Apache restart for kicks – no effect. Desperate, we did a full Apache stop/start – and the problem disappeared! Not sure for how long. If it recurs, I’ll take a packet trace and see if Chrome is just not sending the cookie, or sending it partially, or sending it and it’s Apache jacking up… But it’s strange there would be an Apache-end problem that only Chrome would experience.
I see a number of posts out there in the wide world about this issue; people have seen this Chrome behavior in YouTube, Lycos, etc. Mostly they think that reloading/clearing cache fixes it but I suspect that those services also have large load balanced clusters, and by luck of the draw they’re just getting a “good” one.
Any other server admins out there having Chrome issues, and can confirm this? I’d be real interested in knowing what Web servers/versions it’s affecting. And a packet trace of a “bad” hit would probably show the root cause. I suspect for some reason Chrome is partially sending the cookie or whatnot, choking the hit.