Tag Archives: error

Java Docker Pull Travails

Just had a problem that I thought I’d document the solution to for the world…

In our build pipeline at work, we use maven and the fabric8 docker-maven-plugin to manage our builds.  We love it, developers can just “mvn install” locally and then the Atlassian Bamboo build system just “mvn deploy”s in the exact same way.

Well, so we had some builds that suddenly weren’t able to pull the base images specified in our Dockerfiles down from Dockerhub, breaking the build with 500 error messages like:

[ERROR] DOCKER> Unable to pull 'library/debian:sid' from registry 'docker.io' : received unexpected HTTP status: 500 Server Error (Internal Server Error: 500) [received unexpected HTTP status: 500 Server Error (Internal Server Error: 500)]

But it worked fine on our local box. And it could pull our custom images from Artifactory fine. What’s the problem here?  Bamboo?  The plugin? Well, some helpful community folks helped home in on it, it turns out that for some versions of Java 1.8, 8u131 and prior at least going back to 112, where there’s some problem (TLS? Root certs? Not really sure) that messes up when pulling a docker.io container from inside Java during our docker build step.  My team’s microservices aren’t Java based so the Java version doesn’t come up much – but of course maven uses Java.

Upgrading the JDK version to 8u144 made the problem go away.  We actually have an up to date curated Java version we use in Bamboo for our Java builds, but folks doing Python builds were just using the default “JDK 1.8” that Atlassian is putting on their Bamboo build agent AMI, which is of course old and suffers from this issue.

 

Leave a comment

Filed under DevOps

Google Chrome Hates You (Error 320)

The 1.0 release of Google Chrome has everyone abuzz.  Here at NI, loads of people are adopting it.  Shortly after it went gold, we started to hear from users that they were having problems with our internal collaboration solution, based on the Atlassian Confluence wiki product.  They’d hit a page and get a terse error, which if you clicked on “More Details” you got the slightly more helpful, or at least Googleable, string  “Error 320 (net::ERR_INVALID_RESPONSE): Unknown error.”

At first, it seemed like if people reloaded or cleared cache the problem went away.  It turned out this wasn’t true – we have two load balanced servers in a cluster serving this site.  One server worked in Chrome and the other didn’t; reloading or otherwise breaking persistence just got you the working server for a time.  But both servers worked perfectly in IE and Firefox (every version we have lying around).

So we started researching.  Both servers were as identical as we could make them.  Was it a Confluence bug?  No, we have phpBB on both servers and it showed the same behavior – so it looked like an Apache level problem.

Sure enough, I looked in the logs.  The error didn’t generate an Apache error, it was still considered a 200 OK response, but when I compared the log strings the box that Chrome was erroring on showed that the cookie wasn’t being passed up; that field was blank (it was populated with the cookie value on the other box and on both boxes when hit in IE/Firefox).  Both boxes had an identically compiled Apache 2.0.61.  I diffed all the config files- except for boxname and IP, no difference.  The problem persisted for more than a week.

We did a graceful Apache restart for kicks – no effect.  Desperate, we did a full Apache stop/start – and the problem disappeared!  Not sure for how long.  If it recurs, I’ll take a packet trace and see if Chrome is just not sending the cookie, or sending it partially, or sending it and it’s Apache jacking up…  But it’s strange there would be an Apache-end problem that only Chrome would experience.

I see a number of posts out there in the wide world about this issue; people have seen this Chrome behavior in YouTube, Lycos, etc.  Mostly they think that reloading/clearing cache fixes it but I suspect that those services also have large load balanced clusters, and by luck of the draw they’re just getting a “good” one.

Any other server admins out there having Chrome issues, and can confirm this?  I’d be real interested in knowing what Web servers/versions it’s affecting.  And a packet trace of a “bad” hit would probably show the root cause.  I suspect for some reason Chrome is partially sending the cookie or whatnot, choking the hit.

3 Comments

Filed under General, Uncategorized