Sunday, January 9, 2022

Moving (or renaming) a push-notification ServiceWorker

Service Workers (web workers, etc.) are a relatively new concept.  They can do all kinds of cool things (primarily related to network requests), but they are also the mechanism by which a web site can receive push messages (via Web Push) and show them as OS notifications.

The general rule of Service Workers is to pick a file name (such as "/service-worker.js") and never, ever change it.  That's cool, but sometimes you do need to change it.

In particular, I started my push messaging journey with "platinum-push-messaging", a now-defunct web component built by Google as part of the initial Polymer project.  The promise was cool: just slap this HTML element on your page with a few parameters and boom: you have working push notifications.

When it came out, the push messaging spec was young, and no browsers fully supported its encrypted data payloads, so "platinum-push-messaging" did a lot of work to work around that limitation.  As browsers improved to support the VAPID spec, "platinum-push-messaging" (along with all of the other "platinum" elements) were quietly deprecated and archived (around 2017).

This left me with a problem: a rotting push notification system that couldn't keep up with the spec and the latest browsers.  I hacked the code to all hell to support VAPID and keep the element functioning, but I was just punting.

Apple ruined the declarative promise of the Polymer project by refusing to implement HTML imports, so the web components community adopted the NPM distribution model (and introduced a whole bunch of imperative Javascript drama and compilation tools).  Anyway, no modern web components are installed with Bower anymore, so that left me with a deprecated Service Worker in a path that I wanted to get rid of: "bower_components/platinum-push-messaging/service-worker.js"

Here was my problem:

  1. I wanted the push messaging Service Worker under my control at the top level of my application, "/push-service-worker.js".
  2. I had hundreds of users who were receiving push notifications via this system, and the upgrade had to be seamless (users couldn't be forced to take any action).
I ended up solving the problem by essentially performing a switcheroo:
  1. I had my application store the Web Push subscription info in HTML local storage.  This would be necessary later as part of the switcheroo.
  2. I removed "bower_components/platinum-push-messaging/".  Any existing clients would regularly attempt to update the service worker, but it would quietly fail, leaving the existing one running just fine.
  3. I removed all references to "platinum-push-messaging" from my code.  The existing Service Worker would continue to run (because that's what Service Workers do) and receive push messages (and show notifications).
  4. I made my own push-messaging web component with my own service worker living at "/push-service-worker.js".
  5. (This laid the framework for performing the switcheroo.)
  6. Upon loading, the part of my application that used to include "platinum-push-messaging" did a migration, if necessary, before loading the new push-messaging component:
    1. It went through all the Service Workers and looked for any legacy ones (these had "$$platinum-push-messaging$$" in the scope).  If it found any, it killed them.

      Note that the "$$platinum-push-messaging$$" in the scope was a cute trick by the web component: a page can only be controlled by one Service Worker, and the scope dictates what that Service Worker can control.  By injecting a bogus "$$platinum-push-messaging$$" at the end of the scope, it ensured that the push-messaging Service Worker couldn't accidentally control any pages and get in the way of a main Service Worker.
    2. Upon finding any legacy Service Workers, it would:
      1. Issue a delete to the web server for the old (legacy) subscription (which was stored in HTML local storage).
      2. Tell the application to auto-enable push notifications.
      3. Resume the normal workflow for the application.
  7. The normal workflow for the application entailed loading the new push-messaging web component once the user was logged in.  If a Service Worker was previously enabled, then it would remain active and enabled.  Otherwise, the application wouldn't try to annoy users by asking them for push notifications.
  8. After the new push-messaging web component was included, it would then check to see if it should be auto-enabled (it would only be auto-enabled as part of a successful migration).
    1. If it was auto-enabled, then it would enable push messaging (the user would have already given permission by virtue of having a legacy push Service Worker running).  When the new push subscription was ready, it would post that information to the web server, and the user would have push messages working again, now using the new Service Worker.  The switcheroo was complete.
That's a bit wordy for a simple switcheroo, but it was very important for me to ensure that my users never lost their push notifications as part of the upgrade.  The simple version is: detect legacy Service Worker, kill legacy Service Worker, delete legacy subscription from web server, enable new Service Worker, and save new subscription to web server.

For any given client, the switcheroo happens only once.  The moment that the legacy Service Worker has been killed, it'll never run again (so there's a chance that if the user killed the page in the milliseconds after the kill but before the save, then they'd lose their push notifications, but I viewed this as extremely unlikely; I could technically have stored a status variable, but it wasn't worth it).  After that, it operates normally.

This means that there are one of two ways for a user to be upgraded:
  1. They open up the application after it has been upgraded.  The application prompts them to reload to upgrade if it detects a new version, but eventually the browser will do this on its own, typically after the device reboots or the browser has been fully closed.
  2. They click on a push notification, which opens up the application (which is #1, above).
So at this point, it's a waiting game.  I have to maintain support for the switcheroo until all existing push subscriptions have been upgraded.  The new ones have a flag set in the database, so I just need to wait until all subscriptions have the flag.  Active users who are receiving push notifications will eventually click on one, so I made a note to revisit this and remove the switcheroo code once all of the legacy subscriptions have been removed.

I'm not certain what causes a new subscription to be generated (different endpoint, etc.), but I suspect that it has to do with the scope of the Service Worker (otherwise, how would it know, since service worker code can change frequently?).  I played it safe and just assumed that the switcheroo would generate an entirely new subscription, so I deleted the legacy one no matter what and saved the new one no matter what.