Saturday, April 16, 2022

Golang, http.Client, and "too many open files"

I've been having an issue with my application for a while now, and I finally figured out what the problem was.  In this particular case, the application is a web app (so, think REST API written in Go), and one of its nightly routines is to synchronize a whole bunch of data with various third-party ArcGIS systems.  The application keeps a cache of the ArcGIS images, and this job updates them so they're only ever a day old.  This allows it to show map overlays even if the underlying ArcGIS systems are inaccessible (they're random third-party systems that are frequently down for maintenance).

So, imagine 10 threads constantly making HTTP requests for new map tile images; once a large enough batch is done, the cache is updated, and then the process repeats until the entire cache has been refreshed.

In production, I never noticed a direct problem, but there were times when an ArcGIS system would just completely freak out and start lying about not supporting pagination anymore or otherwise spewing weird errors (but again, it's a third-party system, so what can you do?).  In development, I would notice this particular endpoint failing after a while with a "dial" error of "too many open files".  Every time that I looked, though, everything seemed fine, and I just forgot about it.

This last time, though, I watched the main application's open sockets ("ss -anp | grep my-application"), and I noticed that the number of connections just kept increasing.  This reminded me of my old networking days, and it looked like the TCP connections were just accumulating until the OS felt like closing them due to inactivity.

That's when I found that Go's "http.Client" has a method called "CloseIdleConnections()" that immediately closes any idle connections without waiting for the OS to do it for you.

For reasons that are not relevant here, each request to a third-party ArcGIS system uses its own "http.Client", and because of that, there was no way to reuse any connections between requests, and the application just kept racking up open connections, eventually hitting the default limit of 1024 "open files".  I simply added "defer httpClient.CloseIdleConnections()" after creating the "http.Client", and everything magically behaved as I expected: no more than 10 active connections at any time (one for each of the 10 threads running).

So, if your Go application is getting "too many open files" errors when making a lot of HTTP requests, be sure to either (1) re-architect your application to reuse your "http.Client" whenever possible, or (2) be sure to call "CloseIdleConnections()" on your "http.Client" as soon as you're done with it.

I suspect that some of the third-party ArcGIS issues that I was seeing in production might have essentially been DoS errors caused by my application assaulting these poor servers with thousands of connections.

Saturday, April 2, 2022

Service workers, push notifications, and IndexedDB

I have a pretty simple use case: a user wanted my app to provide a list of "recent" notifications that had been sent to her.  Sometimes a lot of notifications will come through in a relatively short time period, and she wants to be able to look at the list of them to make sure that she's handled them all appropriately.

I ended up having the service worker write the notification to an IndexedDB and then having the UI reload the list of notifications when it receives a "message" event from the service worker.

Before we get there, I'll walk you through my process because it was painful.

Detour: All the mistakes that I made

Since I was already using HTML local storage for other things, I figured that I would just record the list of recent notifications in there.  Every time that the page would receive a "message" event, it would add the event data to a list of notifications in local storage.  That kind of worked as long as I was debugging it.  As long as I was looking at the page, the page was open, and it would receive the "message" event.

However, in the real world, my application is installed as a "home screen" app on Android and is usually closed.  When a notification arrived, there was no open page to receive the "message" event, and it was lost.

I then tried to have the service worker write to HTML local storage instead.  It wouldn't matter which side (page or service worker) actually wrote the data since both sides would detect a change immediately.  Except that's not how it works.  Service workers can't use HTML local storage because of some rules around being asynchronous or something.

Anyway, HTML local storage was impossible as a simple communication and storage mechanism.

Because the page was usually not opened, MessageChannel and BroadcastChannel also wouldn't work.

I finally settled on using IndexedDB because a service worker is allowed to use it.  The biggest annoyance (in the design) was that there is no way to have a page "listen" for changes to an IndexedDB, so I couldn't just trivially tell my page to update the list of notifications to display when there was a change to the database.

After implementing IndexedDB, I spent a week trying to figure out why it wasn't working half the time, and that leads us to how service workers actually work.

Detour: How service workers work

Service workers are often described as a background process for your page.  The way that you hear about them, they sound like daemons that are always running and process events when they receive them.

But that's not anywhere near correct in terms of how they are implemented.  Service workers are more like "serverless" functions (such as Google Cloud Functions) in that they generally aren't running, but if a request comes in that they need to handle, then one is spun up to handle the request, and it'll be kept around for a few minutes in case any other requests come in for it to handle, and then it'll be shut down.

So my big mistake was thinking that once I initialized something in my service worker then it would be available more or less indefinitely.  The browser knows what events a service worker has registered ("push", "message", etc.) and can spin up a new worker whenever it wants, typically to handle such an event and then shut it down again shortly thereafter.

Service workers have an "install" event that gets run when new service worker code gets downloaded.  This is intended to be run exactly once for that version of the service worker.

There is also an "activate" event that gets run when an actual worker has been assigned to the task.  You can basically view this as an event that gets once when a service worker process starts running, regardless of how many times this particular code has been run previously.  If you need to initialize some global things for later functions to call, you should do it here.

The "push" event is run when a push message has been received.  Whatever work you need to do should be done in the event's "waitUntil" method as a promise chain that ultimately results in showing a notification to the user.

Detour: How IndexedDB works

IndexedDB was seemingly invented by people who had no concept of Promises in JavaScript.  Its API is entirely insane and based on "onsuccess", "oncomplete", and "onerror" callbacks.  (You can technically also use event listeners, but it's just as insane.)  It's an asynchronous API that doesn't use any of the standard asynchronous syntax as anything else in modern JavaScript.  It is what it is.

Here's what you need to know: everything in IndexedDB is callbacks.  Everything.  So, if you want to connect to a database, you'll need to make an IDBRequest and set the "onsuccess" callback.  Once you have the database, you'll need to create a transaction and set the "oncomplete" callback.  Then you can create another IDBRequest for reading or writing data from an object store (essentially a table) and setting the "onsuccess" callback.  It's callback hell, but it is what it is.  (Note that there are wrapper libraries that provide Promise-based syntax, but I hate having to wrap a standard feature for no good reason.)

(Also, there's an "onupgradeneeded" callback at the database level that you can use to do any schema- or data-related work if you're changing the database version.)

Putting it all together

I decided that there was no reason to waste cycles opening the IndexedDB on "activate" since there's no guarantee that it'll actually be used.  Instead, I had the "push" event use the previous database connection (if there was one) or create a new connection (if there wasn't).

I put together the following workflow for my service worker:

  1. Register the "push" event handler ("event.waitUntil(...)"):
    1. (Promise) Connect to the IndexedDB.
      1. If we already have a connection from a previous call, then return that.
      2. Otherwise, connect to the IndexedDB and return that (and also store it for quick access the next time so we don't have to reconnect).
    2. (Promise) Read the list of notifications from the database.
    3. (Promise) Add the new notification to the list and write it back to the database.
    4. (Promise) Fire a "message" event to all active pages and show a notification if no page is currently visible to the user.
And for my page:
  1. Load the list of notifications from the IndexedDB when the page loads.  (This sets our starting point, and any changes will be communicated by a "message" event from the service worker.)
  2. Register the "message" event handler:
    1. Reload the list of notifications from the IndexedDB.  (Remember, there's no way to be notified on changes, so receiving the "message" event and reloading is the best that we can do.)
    2. (Handle the message normally; for me, this shows a little toast with the message details and a link to click on to take the user to the appropriate screen.)

For me, the database work is a nice-to-have; the notification is the critical part of the workflow.  So I made sure that every database-related error was handled and the Promises resolved no matter what.  This way, even if there was a completely unexpected database issue, it would just get quietly skipped and the notification could be shown to the user.

In my code, I created some simple functions (to deal with the couple of IndexedDB interactions that I needed) that return Promises so I could operate normally.  You could technically just do a single "new Promise(...)" to cover all of the IndexedDB work if you wanted, or you could one of those fancy wrapper libraries.  In any case, you must call "event.waitUntil" with a Promise chain that ultimately resolves after doing something with the notification.  How you get there is up to you.

I also was using the IndexedDB as an asynchronous local storage, so I didn't need fancy keys or sorting or anything.  I just put all of my data under a single key that I could "get" and "put" trivially without having to worry about row counts or any other kind of data management.  There's a single object store with a single row in it.