Recovered from panic: [*net.OpError] write tcp 127.0.0.1:58219->127.0.0.1:42590: use of closed network connection
Uncaught signal: 4, pid=280, tid=280, fault_addr=94169238392146.
I've been using KDE for over a decade now, and something that started happening in the past year or two (at least on Kubuntu 22.04) would be that my whole screen would mostly freeze. Generally, I'd be able to alt-tab between windows, interact with them, etc., but I couldn't click on or interact with anything related to the window manager (the title bars, the task bar, etc.).
In my case, I'd immediately notice when I came back to my desk and there was obviously a notification at some point, but the rendering got all screwed up:
killall plasmashell
plasmashell --replace
I recently had to create a Jenkins job that needed to use a lot of disk space. The short version of the story is that the job needed to dump the contents of a Postgres database and upload that to Artifactory, and the "jfrog" command line tool won't let you stream an upload, so the entire dump had to be present on disk in order for it to work.
I run my Jenkins on Kubernetes, and the Kubernetes hosts absolutely didn't have the disk space needed to dump this database, and it was definitely too big to use a memory-based filesystem.
The solution was to use a dynamic Persistent Volume Claim, which is maybe(?) implemented as an ephemeral volume in Kubernetes, but the exact details of what it does under the hood aren't important. What is important is that, as part of the job running, a new Persistent Volume Claim (PVC) gets created and is available for all of the containers in the pod. When the job finishes, the PVC gets destroyed. Perfect.
I couldn't figure out how to create a dynamic PVC as an ordinary volume that would get mounted on all of my containers (it's a thing, but apparently not for a declarative pipeline), but I was able to get the "workspace" dynamic PVC working.
A "workspace" volume is shared across all of the containers in the pod and have the Jenkins workspace mounted. This has all of the Git contents, including the Jenkinsfile, for the job (I'm assuming that you're using Git-based jobs here). Since all of the containers share the same workspace volume, any work done in one container is instantly visible in all of the others, without the need for Jenkins stashes or anything.
The biggest problem that I ran into was the permissions on the "workspace" file system. Each of my containers had a different idea of what the UID of the user running the container would be, and all of the containers have to agree on the permissions around their "workspace" volume.
I ended up cheating and just forcing all of my containers to run as root (UID 0), since (1) everyone could agree on that, and (2) I didn't have to worry about "sudo" not being installed on some of the containers that needed to install packages as part of their setup.
To use a "workspace" volume, set workspaceVolume inside the kubernetes block:
kubernetes {
workspaceVolume dynamicPVC(accessModes: 'ReadWriteOnce', requestsSize: "300Gi")
yaml '''
---
apiVersion: v1
kind: Pod
spec:
securityContext:
fsGroup: 0
runAsGroup: 0
runAsUser: 0
containers:
[...]
In this example, we allocate a 300GiB volume for the duration of the job running.
For more information about using Kubernetes agents in Jenkins, see the official docs, but (at least of the time of this writing) they're missing a whole lot of information about volume-related things.
If you see Jenkins trying to create and then delete pods over and over and over again, you have something else wrong. In my case, the Kubernetes service accout that Jenkins uses didn't have any permissions around "persistentvolumeclaims" objects, so every time that the Pod was created, it would fail and try again.
I was only able to see the errors in the Jenkins logs in Kubernetes; they looked something like this:
Caused: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.100.0.1:443/api/v1/namespaces/cicd/persistentvolumeclaims. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims is forbidden: User "system:serviceaccount:cicd:default" cannot create resource "persistentvolumeclaims" in API group "" in the namespace "cicd".
I didn't have the patience to figure out exactly what was needed, so I just gave it everything:
Apache has its own built-in authentication system(s) for providing access control to a site that it's hosting. You've probably encountered this before using "basic" authentication backed by a flatfile created and edited using the htpasswd command.
If you do this using the common guides on the Internet (for example, this guide from Apache itself), then when you go to your site, you'll be presented with your browser's built-in basic-authentication dialog box asking for a username and password. If you provide valid credentials, then you'll be moved on through to the main site, and if you don't, then it'll dump you to a plain "401 Unauthorized" page.
This works fine, but it has three main drawbacks:
(And the built-in popup is really ugly, and it submits the password in plaintext, etc., etc.)
To solve this problem, Apache has a type of authentication called "form" that adds an extra step involving an HTML form (that's fully customizable).
The workflow is as follows:
On Ubuntu, I believe that these were all installed out of the box but needed to be enabled separately. On Red Hat, I had to install the mod_session package, but everything was otherwise already enabled.
If you want to try out "form" authentication, I recommend that you get everything working with "basic" authentication first. This is especially true if you have multiple directories that need to be configured separately.
For this example, I'm going to use our Nagios server.
There were two directories that needed to be protected: "/usr/local/nagios/sbin" and "/usr/local/nagios/share". This setup is generally described by this document (although it covers "digest" authentication instead of "basic").
For both directories that already had "AuthType" set up, the changes are simple:
I decided to put my login page at "/login.html" because that makes sense, but you could put it anywhere (and even host it on a different server if you specify a full URL instead of just a path).
That page should contain a "form" with two "input" elements: "httpd_username" and "httpd_password". The form "action" should be set to "/do-login.html" (or whatever handler you want to register with Apache).
At its simplest, "login.html" looks like this:
<form method="POST" action="/do-login.html">
Username: <input type="text" name="httpd_username" value="" />
Password: <input type="password" name="httpd_password" value="" />
<input type="submit" name="login" value="Login" />
</form>
You'll probably want an "html" tag, a title and body and such, maybe some CSS, but this'll get the job done.
The last step is to register the thing that'll actually process the form data: "/do-login.html"
In your Apache config, add a "location" for it:
<Location "/do-login.html">
SetHandler form-login-handler
AuthType form
AuthName "Nagios Access"
AuthFormProvider file
AuthUserFile /path/to/your/htpasswd.users
AuthFormLoginRequiredLocation "/login.html"
AuthFormLoginSuccessLocation "/nagios/"
Session On
SessionCookieName session path=/
</Location>
The key thing here is SetHandler form-login-handler. This tells Apache to use its built-in form handler to take the values from httpd_username and httpd_password and compare them against your authentication provider(s) (in this example, it's just a flatfile, but you could use LDAP, etc.).
The other two options handle the last bit of navigation. AuthFormLoginRequiredLocation sends you back to the login page if the username/password combination didn't work (you could potentially have another page here with an error message pre-written). AuthFormLoginSuccessLocation sends you to the place where you want the user to go after login (I'm sending the user to the main Nagios page, but you could send them anywhere).
I've just covered the "file" authentication provider here. If you use "ldap" and/or any others, then that config will need to be copied to every single place where you have "form" authentication set up, just like you would if you were only using the "file" provider.
I found this to be really annoying, since I had two directories to protect plus the form handler, so that brings over another 4 lines or so to each config section, but what matters is that it works.
From time to time, I'll have a use case where some box needs to talk to some website that it can't reach (through networking issues), and the easiest thing to do is to throw an nginx reverse proxy on a network that it can reach (such that the reverse proxy can reach both).
The whole shtick of a reverse proxy is that you can access the reverse proxy directly and it'll forward the request on to the appropriate destination and more or less masquerade itself as if it were the destination. This is in contrast with a normal HTTP proxy that would be configured separately (if supported by whatever tool you're trying to use). Sometimes a normal HTTP proxy is the best tool for the job, but sometimes you can cheat with a tweak to /etc/hosts and a reverse proxy and nobody needs to know what happened.
Here, we're focused on the reverse proxy.
In this case, we have the following scenario:
At first, I was seeing this error message on the reverse proxy's "nginx/error.log":
connect() to XXXXXX:443 failed (13: Permission denied) while connecting to upstream, client: XXXXXX, server: site1.example.com, request: "GET / HTTP/1.1"
"Permission denied" isn't great, and that told me that it was something OS-related.
Of course, it was an SELinux thing (in /var/log/messages):
SELinux is preventing /usr/sbin/nginx from name_connect access on the tcp_socket port 443.
The workaround was:
setsebool -P nis_enabled 1
This also was suggested by the logs, but it didn't seem to matter:
setsebool -P httpd_can_network_connect 1
After fixing that, I was seeing:
SSL_do_handshake() failed (SSL: error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:SSL alert number 40) while SSL handshaking to upstream, client: XXXXXX, server: site1.example.com, request: "GET / HTTP/1.1"
After tcpdump-ing the traffic from Box 1 and also another box that could directly talk to site1.example.com, it was clear Box 1 was not using SNI in its requests (SNI is a TLS extension that passes the host name in plaintext so that proxies and load balancers can properly route name-based requests).
It took way too long for me to find the nginx setting to enable it (I don't know why its disabled by default), but it's:
proxy_ssl_server_name on;
Anyway, the final nginx config for the reverse proxy on Box 2 was:
server {
listen 443 ssl;
server_name site1.example.com;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/server.key;
ssl_protocols TLSv1.2;
location / {
proxy_pass https://site1.example.com;
proxy_ssl_session_reuse on;
proxy_ssl_server_name on;
}
}
As far as Box 1 was concerned, it could connect to site1.example.com with only a small tweak to /etc/hosts.
After migrating my application from Google Cloud Engine to Google Cloud Run, I suddenly had a use case for optimizing CPU utilization.
In my analysis of my most CPU-intensive workloads, it turned out that the majority of the time was spent encoding PNG files.
tl;dr Use image.NRGBA when you intend to encode a PNG file.
(For reference, this particular application has a Google Maps overlay that synthesizes data from other sources into tiles to be rendered on the map. The main synchronization job runs nightly and attempts to build or download new tiles for the various layers based on data from various ArcGIS systems.)
Looking at my code, I couldn't really reduce the number of calls to png.Encode, but that encoder really looked inefficient. I deleted the callgrind files (sorry), but basically, half of the CPU time in png.Encode was around memory operations and some runtime calls.
I started looking around for maybe some options to pass to the encoder or a more purpose-built implementation. I ended up finding a package that mentioned a speedup, but only for NRGBA images. However, that package looked fairly unused, and I wasn't about to turn all of my image processing to so something with 1 commit and no users.
This got me thinking, though: what is NRGBA?
It turns out that there are (at least) two ways of thinking about the whole alpha channel thing in images:
For my human mind, using various tools and software over the years, when I think of "RGBA", I think of "one channel each for red, green, and blue, and one channel for the opacity of the pixel". So what this means is that I'm thinking of "NRGBA" (for non-premultiplied RGBA).
(Apparently there are good use cases for both, and when compositing, at some point you'll have to multiply by the alpha value, so "RGBA" already has that done for you.)
Okay, whatever, so what does this have to do with CPU optimization?
In Go, the png.Encode function is optimized for NRGBA images. There's a tiny little hint about this in the comment for the function:
Any Image may be encoded, but images that are not image.NRGBA might be encoded lossily.
This is corroborated by the PNG rationale document, which explains that
PNG uses "unassociated" or "non-premultiplied" alpha so that images with separate transparency masks can be stored losslessly.
If you want to have the best PNG encoding experience, then you should encode images that use NRGBA already. In fact, if you open up the code, you'll see that it will convert the image to NRGBA if it's not already in that format.
Coming back to my callgrind analysis, this is where all that CPU time was spent: converting an RGBA image to an NRGBA image. I certainly thought that it was strange how much work was being done creating a simple PNG file from a mostly-transparent map tile.
Why did I even have RGBA images? Well, my tiling API has to composite tiles from other systems into a single PNG file, so I simply created that new image with image.NewRGBA. And why that function? Because as I mentioned before, I figured "RGBA" meant "RGB with an alpha channel", which is what I wanted so that it would support transparency. It never occurred to me that "RGBA" was some weird encoding scheme for pixels in contrast to another encoding scheme called "NRGBA"; my use cases had never had me make such a distinction.
Anyway, after switching a few image.NewRGBA calls to image.NewNRGBA (and literally that was it; no other code changed), my code was way more efficient, cutting down on CPU utilization by something like 50-70%. Those RGBA to NRGBA conversions really hurt.