I've been having an issue with my application for a while now, and I finally figured out what the problem was. In this particular case, the application is a web app (so, think REST API written in Go), and one of its nightly routines is to synchronize a whole bunch of data with various third-party ArcGIS systems. The application keeps a cache of the ArcGIS images, and this job updates them so they're only ever a day old. This allows it to show map overlays even if the underlying ArcGIS systems are inaccessible (they're random third-party systems that are frequently down for maintenance).
So, imagine 10 threads constantly making HTTP requests for new map tile images; once a large enough batch is done, the cache is updated, and then the process repeats until the entire cache has been refreshed.
In production, I never noticed a direct problem, but there were times when an ArcGIS system would just completely freak out and start lying about not supporting pagination anymore or otherwise spewing weird errors (but again, it's a third-party system, so what can you do?). In development, I would notice this particular endpoint failing after a while with a "dial" error of "too many open files". Every time that I looked, though, everything seemed fine, and I just forgot about it.
This last time, though, I watched the main application's open sockets ("ss -anp | grep my-application"), and I noticed that the number of connections just kept increasing. This reminded me of my old networking days, and it looked like the TCP connections were just accumulating until the OS felt like closing them due to inactivity.
That's when I found that Go's "http.Client" has a method called "CloseIdleConnections()" that immediately closes any idle connections without waiting for the OS to do it for you.
For reasons that are not relevant here, each request to a third-party ArcGIS system uses its own "http.Client", and because of that, there was no way to reuse any connections between requests, and the application just kept racking up open connections, eventually hitting the default limit of 1024 "open files". I simply added "defer httpClient.CloseIdleConnections()" after creating the "http.Client", and everything magically behaved as I expected: no more than 10 active connections at any time (one for each of the 10 threads running).
So, if your Go application is getting "too many open files" errors when making a lot of HTTP requests, be sure to either (1) re-architect your application to reuse your "http.Client" whenever possible, or (2) be sure to call "CloseIdleConnections()" on your "http.Client" as soon as you're done with it.
I suspect that some of the third-party ArcGIS issues that I was seeing in production might have essentially been DoS errors caused by my application assaulting these poor servers with thousands of connections.
No comments:
Post a Comment