After migrating my application from Google Cloud Engine to Google Cloud Run, I suddenly had a use case for optimizing CPU utilization.
In my analysis of my most CPU-intensive workloads, it turned out that the majority of the time was spent encoding PNG files.
tl;dr Use image.NRGBA when you intend to encode a PNG file.
(For reference, this particular application has a Google Maps overlay that synthesizes data from other sources into tiles to be rendered on the map. The main synchronization job runs nightly and attempts to build or download new tiles for the various layers based on data from various ArcGIS systems.)
Looking at my code, I couldn't really reduce the number of calls to png.Encode, but that encoder really looked inefficient. I deleted the callgrind files (sorry), but basically, half of the CPU time in png.Encode was around memory operations and some runtime calls.
I started looking around for maybe some options to pass to the encoder or a more purpose-built implementation. I ended up finding a package that mentioned a speedup, but only for NRGBA images. However, that package looked fairly unused, and I wasn't about to turn all of my image processing to so something with 1 commit and no users.
This got me thinking, though: what is NRGBA?
It turns out that there are (at least) two ways of thinking about the whole alpha channel thing in images:
- In RGBA, each of the red, green, and blue channels has already been premultiplied by the alpha channel, such that the value of, for example, R can range from 0 to A, but no higher.
- In NRGBA, each of the red, green, and blue channels has its original value, and the alpha channel merely represents the opacity of the pixel in general.
For my human mind, using various tools and software over the years, when I think of "RGBA", I think of "one channel each for red, green, and blue, and one channel for the opacity of the pixel". So what this means is that I'm thinking of "NRGBA" (for non-premultiplied RGBA).
(Apparently there are good use cases for both, and when compositing, at some point you'll have to multiply by the alpha value, so "RGBA" already has that done for you.)
Okay, whatever, so what does this have to do with CPU optimization?
In Go, the png.Encode function is optimized for NRGBA images. There's a tiny little hint about this in the comment for the function:
Any Image may be encoded, but images that are not image.NRGBA might be encoded lossily.
This is corroborated by the PNG rationale document, which explains that
PNG uses "unassociated" or "non-premultiplied" alpha so that images with separate transparency masks can be stored losslessly.
If you want to have the best PNG encoding experience, then you should encode images that use NRGBA already. In fact, if you open up the code, you'll see that it will convert the image to NRGBA if it's not already in that format.
Coming back to my callgrind analysis, this is where all that CPU time was spent: converting an RGBA image to an NRGBA image. I certainly thought that it was strange how much work was being done creating a simple PNG file from a mostly-transparent map tile.
Why did I even have RGBA images? Well, my tiling API has to composite tiles from other systems into a single PNG file, so I simply created that new image with image.NewRGBA. And why that function? Because as I mentioned before, I figured "RGBA" meant "RGB with an alpha channel", which is what I wanted so that it would support transparency. It never occurred to me that "RGBA" was some weird encoding scheme for pixels in contrast to another encoding scheme called "NRGBA"; my use cases had never had me make such a distinction.
Anyway, after switching a few image.NewRGBA calls to image.NewNRGBA (and literally that was it; no other code changed), my code was way more efficient, cutting down on CPU utilization by something like 50-70%. Those RGBA to NRGBA conversions really hurt.