Friday, February 10, 2023

Using a dynamic PVC on Kubernetes agents in Jenkins

I recently had to create a Jenkins job that needed to use a lot of disk space. The short version of the story is that the job needed to dump the contents of a Postgres database and upload that to Artifactory, and the "jfrog" command line tool won't let you stream an upload, so the entire dump had to be present on disk in order for it to work.

I run my Jenkins on Kubernetes, and the Kubernetes hosts absolutely didn't have the disk space needed to dump this database, and it was definitely too big to use a memory-based filesystem.

The solution was to use a dynamic Persistent Volume Claim, which is maybe(?) implemented as an ephemeral volume in Kubernetes, but the exact details of what it does under the hood aren't important.  What is important is that, as part of the job running, a new Persistent Volume Claim (PVC) gets created and is available for all of the containers in the pod.  When the job finishes, the PVC gets destroyed.  Perfect.

I couldn't figure out how to create a dynamic PVC as an ordinary volume that would get mounted on all of my containers (it's a thing, but apparently not for a declarative pipeline), but I was able to get the "workspace" dynamic PVC working.

A "workspace" volume is shared across all of the containers in the pod and have the Jenkins workspace mounted.  This has all of the Git contents, including the Jenkinsfile, for the job (I'm assuming that you're using Git-based jobs here).  Since all of the containers share the same workspace volume, any work done in one container is instantly visible in all of the others, without the need for Jenkins stashes or anything.

The biggest problem that I ran into was the permissions on the "workspace" file system.  Each of my containers had a different idea of what the UID of the user running the container would be, and all of the containers have to agree on the permissions around their "workspace" volume.

I ended up cheating and just forcing all of my containers to run as root (UID 0), since (1) everyone could agree on that, and (2) I didn't have to worry about "sudo" not being installed on some of the containers that needed to install packages as part of their setup.

Using "workspace" volumes

To use a "workspace" volume, set workspaceVolume inside the kubernetes block:

kubernetes {
   workspaceVolume dynamicPVC(accessModes: 'ReadWriteOnce', requestsSize: "300Gi")
   yaml '''
---
apiVersion: v1
kind: Pod
spec:
   securityContext:
      fsGroup: 0
      runAsGroup: 0
      runAsUser: 0
   containers:
[...]

In this example, we allocate a 300GiB volume for the duration of the job running.

In addition, you can see that I set the user and group information to 0 (for "root"), which let me work around all the annoying UID mismatches across the containers.  If you only have one container, then obviously you don't have to do this.  Also, if you have full control of your containers, then you can probably set them up with a known user with a fixed UID who can sudo, etc., as necessary.

For more information about using Kubernetes agents in Jenkins, see the official docs, but (at least of the time of this writing) they're missing a whole lot of information about volume-related things.

Troubleshooting

If you see Jenkins trying to create and then delete pods over and over and over again, you have something else wrong.  In my case, the Kubernetes service accout that Jenkins uses didn't have any permissions around "persistentvolumeclaims" objects, so every time that the Pod was created, it would fail and try again.

I was only able to see the errors in the Jenkins logs in Kubernetes; they looked something like this:

Caused: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.100.0.1:443/api/v1/namespaces/cicd/persistentvolumeclaims. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims is forbidden: User "system:serviceaccount:cicd:default" cannot create resource "persistentvolumeclaims" in API group "" in the namespace "cicd".

I didn't have the patience to figure out exactly what was needed, so I just gave it everything:

- verbs:
    - create
    - delete
    - get
    - list
    - patch
    - update
    - watch
  apiGroups:
    - ''
  resources:
    - persistentvolumeclaims