tag:blogger.com,1999:blog-14984392488602520272024-03-13T23:43:19.011-04:00Sense CodonsDoughttp://www.blogger.com/profile/08152661329266416713noreply@blogger.comBlogger43125tag:blogger.com,1999:blog-1498439248860252027.post-63918857246473326682023-04-05T11:09:00.005-04:002023-04-05T11:11:16.557-04:00Dealing with KDE "plasmashell" freezing<p>I've been using KDE for over a decade now, and something that started happening in the past year or two (at least on Kubuntu 22.04) would be that my whole screen would mostly freeze. Generally, I'd be able to alt-tab between windows, interact with them, etc., but I couldn't click on or interact with anything related to the window manager (the title bars, the task bar, etc.).</p><p>In my case, I'd immediately notice when I came back to my desk and there was obviously a notification at some point, but the rendering got all screwed up:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjn-pCU4dJLl2atBSaMnzUEYczE8-01-o8bc_9tkIrGe9gnMCEEgY_5gsHCMizIKhr9nAaYOgwYt4YvN6iDRRq9HQdVjN32UikS9EjCECsLrdlTZ0k1J4sjJ-qzle-3D6Ay_RYLkTD4hop6fUxuEJDnzl0YqsSNeNYSEehN0NzOWzgQZVMMprsBQo3s/s722/kde-plasma-froen.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="451" data-original-width="722" height="250" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjn-pCU4dJLl2atBSaMnzUEYczE8-01-o8bc_9tkIrGe9gnMCEEgY_5gsHCMizIKhr9nAaYOgwYt4YvN6iDRRq9HQdVjN32UikS9EjCECsLrdlTZ0k1J4sjJ-qzle-3D6Ay_RYLkTD4hop6fUxuEJDnzl0YqsSNeNYSEehN0NzOWzgQZVMMprsBQo3s/w400-h250/kde-plasma-froen.png" width="400" /></a></div><div><br /></div>In this image, you can see that the notification toast window has no visible content and instead looks like the KDE background image. Also, the time is locked at 5:29 PM, which is when this problem happened (I didn't get back to my desk until 8:30 AM the next morning).<div><br /></div><div>The general fix for this is to use a shell (if you have one open, great; if not, press ctrl+alt+F2 to jump to the console) and kill "plasmashell":</div><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><div style="text-align: left;"><span style="font-family: courier; font-size: x-small;">killall plasmashell</span></div></blockquote><div><br /></div><div>Once that's done, your window manager should be less broken, but it won't have the taskbar, etc. From there, you can press alt+F2 to open the "run" window, and type in:</div><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><div style="text-align: left;"><span style="font-family: courier; font-size: x-small;">plasmashell --replace</span></div></blockquote><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghCUJV8zkRVbZGDnoRlo46PEJjbcyDmbmg2KWacesizq69dALmx8MRibk1dC5oQg4A4UboDc-9j7D9Z45csH9VcWS40XdC8fq06r5viC5LVqXkANc_WWWO7pNL7DxCNjUtbriOuCb_LWdiw0J9ATQiVLc0HerHUHV-KLF4c5porLbAe4RoqCgml08O/s828/plasmashell-replace.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="166" data-original-width="828" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghCUJV8zkRVbZGDnoRlo46PEJjbcyDmbmg2KWacesizq69dALmx8MRibk1dC5oQg4A4UboDc-9j7D9Z45csH9VcWS40XdC8fq06r5viC5LVqXkANc_WWWO7pNL7DxCNjUtbriOuCb_LWdiw0J9ATQiVLc0HerHUHV-KLF4c5porLbAe4RoqCgml08O/w400-h80/plasmashell-replace.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: left;">You can also run this from a terminal somewhere, but you need to make sure that your "DISPLAY" environment variable is set up correctly, etc. I find it easier to do it it from the run window (and I don't have to worry about redirecting its output anywhere, since "plasmashell" does generate some logging noise).</div><div><p><br /></p></div>Douglas Danger Manleyhttp://www.blogger.com/profile/17044194571403366472noreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-1882297706865191862023-02-10T16:01:00.004-05:002023-02-10T16:01:53.861-05:00Using a dynamic PVC on Kubernetes agents in Jenkins<p style="text-align: left;">I recently had to create a Jenkins job that needed to use a lot of disk space. The short version of the story is that the job needed to dump the contents of a Postgres database and upload that to Artifactory, and the "jfrog" command line tool won't let you stream an upload, so the entire dump had to be present on disk in order for it to work.</p><p style="text-align: left;">I run my Jenkins on Kubernetes, and the Kubernetes hosts absolutely didn't have the disk space needed to dump this database, and it was definitely too big to use a memory-based filesystem.</p><p style="text-align: left;">The solution was to use a <i>dynamic Persistent Volume Claim</i>, which is maybe(?) implemented as an <a href="https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/">ephemeral volume</a> in Kubernetes, but the exact details of what it does under the hood aren't important. What is important is that, as part of the job running, a new Persistent Volume Claim (PVC) gets created and is available for all of the containers in the pod. When the job finishes, the PVC gets destroyed. Perfect.</p><p style="text-align: left;">I couldn't figure out how to create a dynamic PVC as an ordinary volume that would get mounted on all of my containers (it's a thing, but apparently <a href="https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/KubernetesDeclarativeAgent.java">not for a declarative pipeline</a>), but I was able to get the "workspace" dynamic PVC working.</p><p style="text-align: left;">A "workspace" volume is shared across all of the containers in the pod and have the Jenkins workspace mounted. This has all of the Git contents, including the Jenkinsfile, for the job (I'm assuming that you're using Git-based jobs here). Since all of the containers share the same workspace volume, any work done in one container is instantly visible in all of the others, without the need for Jenkins stashes or anything.</p><p style="text-align: left;">The biggest problem that I ran into was the permissions on the "workspace" file system. Each of my containers had a different idea of what the UID of the user running the container would be, and <i>all </i>of the containers have to agree on the permissions around their "workspace" volume.</p><p style="text-align: left;">I ended up cheating and just forcing all of my containers to run as root (UID 0), since (1) everyone could agree on that, and (2) I didn't have to worry about "sudo" not being installed on some of the containers that needed to install packages as part of their setup.</p><h2 style="text-align: left;">Using "workspace" volumes</h2><p style="text-align: left;">To use a "workspace" volume, set <span style="font-family: courier; font-size: x-small;">workspaceVolume</span> inside the <span style="font-family: courier; font-size: x-small;">kubernetes</span> block:</p><p><span style="font-family: courier; font-size: x-small;">kubernetes {<br /> workspaceVolume dynamicPVC(accessModes: 'ReadWriteOnce', requestsSize: "300Gi")<br /> yaml '''<br />---<br />apiVersion: v1<br />kind: Pod<br />spec:<br /> securityContext:<br /> fsGroup: 0<br /> runAsGroup: 0<br /> runAsUser: 0<br /> containers:<br />[...]</span></p><p>In this example, we allocate a 300GiB volume for the duration of the job running.</p><div>In addition, you can see that I set the user and group information to 0 (for "root"), which let me work around all the annoying UID mismatches across the containers. If you only have one container, then obviously you don't have to do this. Also, if you have full control of your containers, then you can probably set them up with a known user with a fixed UID who can sudo, etc., as necessary.</div><p style="text-align: left;">For more information about using Kubernetes agents in Jenkins, see <a href="https://plugins.jenkins.io/kubernetes/">the official docs</a>, but (at least of the time of this writing) they're missing a whole lot of information about volume-related things.</p><h2 style="text-align: left;">Troubleshooting</h2><p style="text-align: left;">If you see Jenkins trying to create and then delete pods over and over and over again, you have something else wrong. In my case, the Kubernetes service accout that Jenkins uses didn't have any permissions around "persistentvolumeclaims" objects, so every time that the Pod was created, it would fail and try again.</p><p style="text-align: left;">I was only able to see the errors in the Jenkins logs in Kubernetes; they looked something like this:</p><p><span style="font-family: courier; font-size: x-small;">Caused: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.100.0.1:443/api/v1/namespaces/cicd/persistentvolumeclaims. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims is forbidden: User "system:serviceaccount:cicd:default" cannot create resource "persistentvolumeclaims" in API group "" in the namespace "cicd".</span></p><p style="text-align: left;">I didn't have the patience to figure out exactly what was needed, so I just gave it everything:</p><div><div><span style="font-family: courier; font-size: x-small;">- verbs:</span></div><div><span style="font-family: courier; font-size: x-small;"> - create</span></div><div><span style="font-family: courier; font-size: x-small;"> - delete</span></div><div><span style="font-family: courier; font-size: x-small;"> - get</span></div><div><span style="font-family: courier; font-size: x-small;"> - list</span></div><div><span style="font-family: courier; font-size: x-small;"> - patch</span></div><div><span style="font-family: courier; font-size: x-small;"> - update</span></div><div><span style="font-family: courier; font-size: x-small;"> - watch</span></div><div><span style="font-family: courier; font-size: x-small;"> apiGroups:</span></div><div><span style="font-family: courier; font-size: x-small;"> - ''</span></div><div><span style="font-family: courier; font-size: x-small;"> resources:</span></div><div><span style="font-family: courier; font-size: x-small;"> - persistentvolumeclaims</span></div></div><p style="text-align: left;"><br /></p>Douglas Danger Manleyhttp://www.blogger.com/profile/17044194571403366472noreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-37564821580289761222023-01-31T09:54:00.001-05:002023-01-31T09:54:50.492-05:00Use a custom login page when using Apache to require sign-in<p style="text-align: left;">Apache has its own built-in authentication system(s) for providing access control to a site that it's hosting. You've probably encountered this before using "basic" authentication backed by a flatfile created and edited using the <span style="font-family: courier; font-size: x-small;">htpasswd</span> command.</p><p style="text-align: left;">If you do this using the common guides on the Internet (for example, <a href="https://httpd.apache.org/docs/2.4/howto/auth.html">this guide from Apache itself</a>), then when you go to your site, you'll be presented with your browser's built-in basic-authentication dialog box asking for a username and password. If you provide valid credentials, then you'll be moved on through to the main site, and if you don't, then it'll dump you to a plain "401 Unauthorized" page.</p><p style="text-align: left;">This works fine, but it has three main drawbacks:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Password managers (such as LastPass) can't detect this dialog box and autofill it, which is very annoying.</li><li>On some mobile browsers, the dialog gets in the way of normal operations. Even if you have multiple tabs open, whatever tab is trying to get you to log in will get in the way and force you to deal with it.</li><li>If you're using Windows authentication, the browser might detect the 401 error and attempt to sign you in using your domain credentials. If the server has a different set of credentials, then it'll mean that you can't actually log in due to Windows trying to auto log in.</li></ol><p style="text-align: left;">(And the built-in popup is really ugly, and it submits the password in plaintext, etc., etc.)</p><h2 style="text-align: left;">Apache "Form" Authentication</h2><p style="text-align: left;">To solve this problem, Apache has a type of <a href="https://httpd.apache.org/docs/2.4/mod/mod_auth_form.html">authentication called "form"</a> that adds an extra step involving an HTML form (that's fully customizable).</p><p style="text-align: left;">The workflow is as follows:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Create a login HTML page (you'll have to provide the page).</li><li>Register a handler for that page to POST to (Apache already has the handler).</li><li>Update any "Directory" or "Location" blocks in your Apache config to use the "form" authentication type instead of "basic".</li></ol><div>You'll also need these modules installed and enabled:</div><div><ol style="text-align: left;"><li><span style="font-family: courier; font-size: x-small;">mod_auth_form</span></li><li><span style="font-family: courier; font-size: x-small;">mod_request</span></li><li><span style="font-family: courier; font-size: x-small;">mod_session</span></li><li><span style="font-family: courier; font-size: x-small;">mod_session_cookie</span></li></ol><p style="text-align: left;">On Ubuntu, I believe that these were all installed out of the box but needed to be enabled separately. On Red Hat, I had to install the <span style="font-family: courier; font-size: x-small;">mod_session</span> package, but everything was otherwise already enabled.</p></div><h2 style="text-align: left;">Example</h2><p style="text-align: left;">If you want to try out "form" authentication, I recommend that you get everything working with "basic" authentication first. This is especially true if you have multiple directories that need to be configured separately.</p><p style="text-align: left;">For this example, I'm going to use our Nagios server.</p><p style="text-align: left;">There were two directories that needed to be protected: "/usr/local/nagios/sbin" and "/usr/local/nagios/share". This setup is generally described by <a href="https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/cgisecurity.html">this document</a> (although it covers "digest" authentication instead of "basic").</p><p style="text-align: left;">For both directories that already had "AuthType" set up, the changes are simple:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Change <span style="font-family: courier; font-size: x-small;">AuthType Basic</span> to <span style="font-family: courier; font-size: x-small;">AuthType Form</span>.</li><li>Change <span style="font-family: courier; font-size: x-small;">AuthBasicProvider</span> to <span style="font-family: courier; font-size: x-small;">AuthFormProvider</span>.</li><li>Add the login redirect: <span style="font-family: courier; font-size: x-small;">AuthFormLoginRequiredLocation "/login.html"</span></li><li>Enable sessions: <span style="font-family: courier; font-size: x-small;">Session On</span></li><li>Set a cookie name: <span style="font-family: courier; font-size: x-small;">SessionCookieName session path=/</span></li></ol><p style="text-align: left;"><span style="font-family: inherit;">I decided to put my login page at "/login.html" because that makes sense, but you could put it anywhere (and even host it on a different server if you specify a full URL instead of just a path).</span></p><p style="text-align: left;"><span style="font-family: inherit;">That page should contain a "form" with two "input" elements: "httpd_username" and "httpd_password". The form "action" should be set to "/do-login.html" (or whatever handler you want to register with Apache).</span></p><p style="text-align: left;"><span style="font-family: inherit;">At its simplest, "login.html" looks like this:</span></p><p><span style="font-family: courier; font-size: x-small;"><form method="POST" action="<b>/do-login.html</b>"><br /> Username: <input type="text" name="<b>httpd_username</b>" value="" /><br /> Password: <input type="password" name="<b>httpd_password</b>" value="" /><br /> <input type="submit" name="login" value="Login" /><br /></form></span></p><p></p><p></p><p></p><p style="text-align: left;">You'll probably want an "html" tag, a title and body and such, maybe some CSS, but this'll get the job done.</p><p style="text-align: left;">The last step is to register the thing that'll actually process the form data: "/do-login.html"</p><p style="text-align: left;">In your Apache config, add a "location" for it:</p><p><span style="font-family: courier; font-size: x-small;"><Location "/do-login.html"><br /> SetHandler form-login-handler<br /><br /> AuthType form<br /> AuthName "Nagios Access"<br /> AuthFormProvider file<br /> AuthUserFile /path/to/your/htpasswd.users<br /><br /> AuthFormLoginRequiredLocation "/login.html"<br /> AuthFormLoginSuccessLocation "/nagios/"<br /><br /> Session On<br /> SessionCookieName session path=/<br /></Location></span></p><p style="text-align: left;">The key thing here is <span style="font-family: courier; font-size: small;">SetHandler form-login-handler</span>. This tells Apache to use its built-in form handler to take the values from <span style="font-family: courier; font-size: small;">httpd_username</span> and <span style="font-family: courier; font-size: small;">httpd_password</span> and compare them against your authentication provider(s) (in this example, it's just a flatfile, but you could use LDAP, etc.).</p><p style="text-align: left;">The other two options handle the last bit of navigation. <span style="font-family: courier; font-size: small;">AuthFormLoginRequiredLocation</span> sends you back to the login page if the username/password combination didn't work (you could potentially have <i>another </i>page here with an error message pre-written). <span style="font-family: courier; font-size: small;">AuthFormLoginSuccessLocation</span> sends you to the place where you want the user to go after login (I'm sending the user to the main Nagios page, but you could send them anywhere).</p><p style="text-align: left;"></p><h2 style="text-align: left;">Notes</h2><h3 style="text-align: left;">Other Authentication Providers</h3><p style="text-align: left;">I've just covered the "file" authentication provider here. If you use "ldap" and/or any others, then that config will need to be copied to every single place where you have "form" authentication set up, just like you would if you were only using the "file" provider.</p><p style="text-align: left;">I found this to be really annoying, since I had two directories to protect plus the form handler, so that brings over another 4 lines or so to each config section, but what matters is that it works.</p><p style="text-align: left;"><br /></p><p></p>Douglas Danger Manleyhttp://www.blogger.com/profile/17044194571403366472noreply@blogger.com4tag:blogger.com,1999:blog-1498439248860252027.post-12745120939091833862022-10-19T15:37:00.001-04:002022-10-19T15:37:11.433-04:00Watch out for SNI when using an nginx reverse proxy<p style="text-align: left;">From time to time, I'll have a use case where some box needs to talk to some website that it can't reach (through networking issues), and the easiest thing to do is to throw an nginx reverse proxy on a network that it <i>can </i>reach (such that the reverse proxy can reach <i>both</i>).</p><p style="text-align: left;">The whole shtick<i> </i>of a reverse proxy is that you can access the reverse proxy <i>directly </i>and it'll forward the request on to the appropriate destination and more or less masquerade itself as if it were the destination. This is in contrast with a normal HTTP proxy that would be configured <i>separately </i>(if supported by whatever tool you're trying to use). Sometimes a normal HTTP proxy is the best tool for the job, but sometimes you can cheat with a tweak to <span style="font-family: courier; font-size: x-small;">/etc/hosts</span> and a reverse proxy and nobody needs to know what happened.</p><p style="text-align: left;">Here, we're focused on the reverse proxy.</p><p style="text-align: left;">In this case, we have the following scenario:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Box 1 wants to connect to site1.example.com.</li><li>Box 1 cannot reach site1.example.com.</li></ol><div>To cheat using a reverse proxy, we need Box 2, which:</div><div><ol style="text-align: left;"><li>Can be reached by Box 1.</li><li>Can reach site1.example.com.</li></ol><div>To set up the whole reverse proxy thing, we need to:</div><div><ol style="text-align: left;"><li>Set up nginx on Box 2 to listen on port 443 (HTTPS) and reverse proxy to site1.example.com.</li><li>Update <span style="font-family: courier; font-size: small;">/etc/hosts</span> on Box 1 so that site1.example.com points to Box 2's IP address.</li></ol></div></div><p></p><p style="text-align: left;">At first, I was seeing this error message on the reverse proxy's "nginx/error.log":</p><blockquote style="border: none; margin: 0 0 0 40px; padding: 0px;"><p style="text-align: left;"><span style="color: #666666; font-family: courier; font-size: x-small;">connect() to XXXXXX:443 failed (13: Permission denied) while connecting to upstream, client: XXXXXX, server: site1.example.com, request: "GET / HTTP/1.1"</span></p></blockquote><p style="text-align: left;">"Permission denied" isn't great, and that told me that it was something OS-related.</p><p style="text-align: left;">Of course, it was an SELinux thing (in <span style="font-family: courier; font-size: x-small;">/var/log/messages</span>):</p><blockquote style="border: none; margin: 0 0 0 40px; padding: 0px;"><p style="text-align: left;"><span style="color: #666666; font-family: courier; font-size: x-small;">SELinux is preventing /usr/sbin/nginx from name_connect access on the tcp_socket port 443.</span></p></blockquote><p>The workaround was:</p><blockquote style="border: none; margin: 0 0 0 40px; padding: 0px;"><p style="text-align: left;"><span style="color: #666666; font-family: courier; font-size: x-small;">setsebool -P nis_enabled 1</span></p></blockquote><p>This also was suggested by the logs, but it didn't seem to matter:</p><blockquote style="border: none; margin: 0 0 0 40px; padding: 0px;"><p style="text-align: left;"><span style="color: #666666; font-family: courier; font-size: x-small;">setsebool -P httpd_can_network_connect 1</span></p></blockquote><p style="text-align: left;">After fixing that, I was seeing:</p><blockquote style="border: none; margin: 0 0 0 40px; padding: 0px;"><p style="text-align: left;"><span style="color: #666666; font-family: courier; font-size: x-small;">SSL_do_handshake() failed (SSL: error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:SSL alert number 40) while SSL handshaking to upstream, client: XXXXXX, server: site1.example.com, request: "GET / HTTP/1.1"</span></p></blockquote><p style="text-align: left;">After tcpdump-ing the traffic from Box 1 and also another box that could directly talk to site1.example.com, it was clear Box 1 was not using SNI in its requests (SNI is a TLS extension that passes the host name in plaintext so that proxies and load balancers can properly route name-based requests).</p><p style="text-align: left;">It took way too long for me to find the nginx setting to enable it (I don't know why its disabled by default), but it's:</p><blockquote style="border: none; margin: 0 0 0 40px; padding: 0px;"><p style="text-align: left;"><span style="color: #666666; font-family: courier; font-size: x-small;">proxy_ssl_server_name on;</span></p></blockquote><p style="text-align: left;">Anyway, the final nginx config for the reverse proxy on Box 2 was:</p><p><span style="color: #666666; font-family: courier; font-size: x-small;">server {<br /> listen 443 ssl;<br /> server_name site1.example.com;<br /><br /> ssl_certificate /etc/nginx/ssl/server.crt;<br /> ssl_certificate_key /etc/nginx/server.key;<br /> ssl_protocols TLSv1.2;<br /> <br /> location / {<br /> proxy_pass https://site1.example.com;<br /> proxy_ssl_session_reuse on;<br /> proxy_ssl_server_name on;<br /> }<br />}</span></p><p>As far as Box 1 was concerned, it could connect to site1.example.com with only a small tweak to <span style="font-family: courier; font-size: small;">/etc/hosts</span>.</p>Douglas Danger Manleyhttp://www.blogger.com/profile/17044194571403366472noreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-85061002285830706192022-10-05T22:39:00.000-04:002022-10-05T22:39:01.768-04:00Speed up PNG encoding in Go with NRGBA images<p style="text-align: left;">After <a href="https://blog.sensecodons.com/2022/10/100-cost-savings-switching-from-app.html">migrating my application from Google Cloud Engine to Google Cloud Run</a>, I suddenly had a use case for optimizing CPU utilization.</p><p style="text-align: left;">In my analysis of my most CPU-intensive workloads, it turned out that the majority of the time was spent encoding PNG files.</p><p style="text-align: left;"><b>tl;dr Use <span style="font-family: courier; font-size: x-small;">image.NRGBA</span> when you intend to encode a PNG file.</b></p><p style="text-align: left;">(For reference, this particular application has a Google Maps overlay that synthesizes data from other sources into tiles to be rendered on the map. The main synchronization job runs nightly and attempts to build or download new tiles for the various layers based on data from various ArcGIS systems.)</p><p style="text-align: left;">Looking at my code, I couldn't really reduce the <i>number </i>of calls to <span style="font-family: courier; font-size: x-small;">png.Encode</span>, but that encoder really looked inefficient. I deleted the callgrind files (sorry), but basically, half of the CPU time in <span style="font-family: courier; font-size: x-small;">png.Encode</span> was around memory operations and some <span style="font-family: courier; font-size: x-small;">runtime</span> calls.</p><p style="text-align: left;">I started looking around for maybe some options to pass to the encoder or a more purpose-built implementation. I ended up finding <a href="https://pkg.go.dev/github.com/fumin/png">a package that mentioned a speedup</a>, but only for NRGBA images. However, that package looked fairly unused, and I wasn't about to turn all of my image processing to so something with 1 commit and no users.</p><p style="text-align: left;">This got me thinking, though: what is NRGBA?</p><p style="text-align: left;">It turns out that there are (at least) two ways of thinking about the whole alpha channel thing in images:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>In RGBA, each of the red, green, and blue channels has already been <i>premultiplied </i>by the alpha channel, such that the value of, for example, R can range from 0 to A, but no higher.</li><li>In NRGBA, each of the red, green, and blue channels has its original value, and the alpha channel merely represents the opacity of the pixel in general.</li></ol><p style="text-align: left;">For my human mind, using various tools and software over the years, when I think of "RGBA", I think of "one channel each for red, green, and blue, and one channel for the opacity of the pixel". So what this means is that I'm thinking of "NRGBA" (for non-premultiplied RGBA).</p><p style="text-align: left;">(Apparently there are good use cases for both, and when compositing, at some point you'll have to multiply by the alpha value, so "RGBA" already has that done for you.)</p><p style="text-align: left;">Okay, whatever, so what does this have to do with CPU optimization?</p><p style="text-align: left;">In Go, the <span style="font-family: courier; font-size: x-small;">png.Encode</span> function is <i>optimized for NRGBA images</i>. There's a tiny little hint about this in <a href="https://pkg.go.dev/image/png#Encode">the comment for the function</a>:</p><blockquote><p><span style="color: #666666;">Any <span style="font-family: courier; font-size: x-small;">Image</span> may be encoded, but images that are not <span style="font-family: courier; font-size: x-small;">image.NRGBA</span> might be encoded lossily.</span></p></blockquote><p></p><p>This is corroborated by <a href="https://www.w3.org/TR/PNG-Rationale.html">the PNG rationale document</a>, which explains that</p><blockquote><p><span style="color: #666666;">PNG uses "unassociated" or "non-premultiplied" alpha so that images with separate transparency masks can be stored losslessly.</span></p></blockquote><p>If you want to have the best PNG encoding experience, then you should encode images that use NRGBA already. In fact, <a href="https://cs.opensource.google/go/go/+/refs/tags/go1.19.2:src/image/png/writer.go;l=454">if you open up the code</a>, you'll see that it will <i>convert the image to NRGBA </i>if it's not already in that format.</p><p>Coming back to my callgrind analysis, <i>this </i>is where all that CPU time was spent: converting an RGBA image to an NRGBA image. I certainly thought that it was strange how much work was being done creating a simple PNG file from a mostly-transparent map tile.</p><p>Why did I even have RGBA images? Well, my tiling API has to composite tiles from other systems into a single PNG file, so I simply created that new image with <span style="font-family: courier; font-size: x-small;">image.NewRGBA</span>. And why that function? Because as I mentioned before, I figured "RGBA" meant "RGB with an alpha channel", which is what I wanted so that it would support transparency. It <i>never </i>occurred to me that "RGBA" was some weird encoding scheme for pixels in contrast to another encoding scheme called "NRGBA"; my use cases had never had me make such a distinction.</p><p>Anyway, after switching a few <span style="font-family: courier; font-size: x-small;">image.NewRGBA</span> calls to <span style="font-family: courier; font-size: x-small;">image.NewNRGBA</span> (and literally that was it; no other code changed), my code was way more efficient, cutting down on CPU utilization by something like 50-70%. Those RGBA to NRGBA conversions really hurt.</p>Douglas Danger Manleyhttp://www.blogger.com/profile/17044194571403366472noreply@blogger.com1tag:blogger.com,1999:blog-1498439248860252027.post-54994696202417892932022-10-05T20:50:00.002-04:002022-10-19T11:28:51.604-04:00100% cost savings switching from App Engine to Cloud Run... wild<p>I have written numerous times about how much I like Google's App Engine platform and about how I've tried to port as many of my applications to it as possible. The <i>idea </i>of App Engine is still glorious, but I have now been converted to Cloud Run.</p><p>The primary selling point? It has literally reduced my application's bill to $0. That's a 100% reduction in cost from App Engine to Cloud Run. It's absolutely wild, and it's now my new favorite thing.</p><p>Pour one out for App Engine because Cloud Run is here to stay.</p><h2 style="text-align: left;">Aside: a (brief) history of App Engine</h2><p style="text-align: left;">Before I explain <i>how </i>Cloud Run manages to make my application free to run, I need to first run you through a brief history of App Engine, why it was great, why it became less great, and how Cloud Run could eat its lunch like that.</p><h3 style="text-align: left;">App Engine v1</h3><p style="text-align: left;">Google App Engine entered the market in 2011 as a one-stop solution to host infinitely scalable applications. Prior to App Engine, if you wanted to host a web application, you had to either commit to a weird framework and go with that framework's cloud solution, or you had to spin up a bunch of virtual machines (or god help you Lambda functions) to handle your web traffic, your database, your file systems, maybe tie into a CDN, and also set up Let's Encrypt or something similar to handle your TLS certificates.</p><p style="text-align: left;">From the perspective of an application owner or application developer, that's a lot of work that doesn't directly deal with <i>application </i>stuff and ends up just wasting tons of everyone's time.</p><p style="text-align: left;">App Engine was Google's answer to that, something that we would now call "serverless".</p><p style="text-align: left;">The premise was simple: upload a zip file with your application, some YAML files with some metadata (concerning scaling, performance, etc.), and App Engine would take care of all the annoying IT stuff for you. In fact, they tried to <i>hide </i>as much of it from you as possible—they wanted you to focus on the application logic, not the nitty gritty backend details.</p><p style="text-align: left;">(This allowed Google to do all kinds of tricks in the background to make things better and faster, since the implementation details of their infrastructure were generally out of bounds for your application.)</p><p style="text-align: left;">App Engine's killer feature was that it bundled a whole bunch of services together, things that you'd normally have to set up yourself and pay extra for. This included, but was not limited to:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>The web hosting basics:</li><ol><li>File hosting</li><li>A CDN service</li><li>An execution environment in your language of choice (Java, Python, PHP, etc.)</li></ol><li>A <i>cheap </i>No-SQL database to store all your data</li><li>A memcached memory store</li><li>A scheduler that'll hit any of your web endpoints whenever you want</li><li>A task queue for background tasks</li><li>A service that would accept <i>e-mail </i>and transform it into an HTTP POST request</li></ol><p style="text-align: left;">All of this came with a generous free tier (truly free) where you could try stuff out and generally host small applications with no cost at all. If your application were to run up against the free-tier limits, it would just halt those services until the next day when all of the counters reset and your daily limits restarted.</p><p style="text-align: left;">(Did I mention <i>e-mail</i>? I had so many use cases that dealt with accepting e-mail from other systems and taking some action. This was a godsend.)</p><p style="text-align: left;">The free tier gave you 28 hours of instance time (that is, 28 hours of a single instance of the lowest class). For trying stuff out, that was plenty, especially when your application only had a few users and went unused for hours at a time.</p><p style="text-align: left;">(Yes, if you upgraded your instance type <i>one notch</i> that became 14 hours free, but still.)</p><p style="text-align: left;">To use all of the bonus features, you generally had to link against (more or less) the App Engine SDK, which took care of all of the details behind the scene.</p><p style="text-align: left;">This was everything that you needed to build a scalable web application, and it did its job well.</p><h3 style="text-align: left;">App Engine v2</h3><p style="text-align: left;">Years went by, and Google made some big changes to App Engine. To me, these changes went against the great promise of App Engine, but they make sense when looking at Cloud Run as the App Engine successor.</p><p style="text-align: left;">App Engine v2 added support for newer versions of the languages that App Engine v1 had supported (which was good), but it also took away a lot of the one-stop-shop power (which was bad). The stated reason was to remove a lot of the Google Cloud-specific stuff and make the platform a bit more open and compatible with other clouds.</p><p style="text-align: left;">(While that's generally good, the magic of App Engine was that everything was built in, for free.)</p><p style="text-align: left;">Now, an App Engine application could no longer have free memcached data; instead, you could pay for a Memory Store option. For large applications, it probably didn't matter, but for small ones that cost basically nothing to run, this made memcached untenable.</p><p style="text-align: left;">Similarly, the e-mail service was discontinued, and you were encouraged to move to Twilio SendGrid, which could do something similar. Google was nice enough to give all SendGrid customers a free tier that just about made up for what was lost. The big problem was that all of the built-in tools were gone. Previously, the App Engine local development console had a place where you could type in an e-mail and send it to your application; now you had to write your own tools.</p><p style="text-align: left;">The scheduler and task queue systems were promoted out of App Engine and into Google Cloud, but the App Engine tooling continued to support them, so they required only minimal changes on the application side.</p><p style="text-align: left;">The cheap No-SQL database, Datastore, was also promoted out of App Engine, and there were no changes to make whatsoever. Yay.</p><p style="text-align: left;">The App Engine SDK was removed, and an App Engine application was simply another application in the cloud, with annoying dependencies and third-party integrations.</p><p style="text-align: left;">(Yes, technically Google added back support for the App Engine SDK years later, but meh.)</p><p style="text-align: left;">Overall, for people who used App Engine exclusively for purpose-built App Engine things, v2 was a major step down from v1. Yes, it moved App Engine toward a more standards-based approach, but the problem was that there weren't standards around half the stuff that App Engine was doing, and in the push to make it more open and in-line with other clouds, it lost a lot of its power (because other clouds weren't doing what App Engine was doing, which is why I was using Google's App Engine).</p><h2 style="text-align: left;">So, why Cloud Run?</h2><p style="text-align: left;">Cloud Run picks up where App Engine v2 left off. With all of the App Engine-specific things removed, the next logical jump was to give Google a container image instead of a zip file and let it do its thing. App Engine had already experimented with this with the "Flexible" environment, but its pricing was not favorable compared to the "Standard" offering for the kinds of projects that I had.</p><p style="text-align: left;">From a build perspective, Cloud Run gives you a bit more flexibility than App Engine. You can give Google a Dockerfile and have it take care of all the compiling and such, or you can do most of that locally and give Google a much simpler Dockerfile with your pre-compiled application files. I don't want to get too deep into it, but you have options.</p><p style="text-align: left;">Cloud Run obviously runs your application as a container. Did App Engine? Maybe? Maybe not? I figured that it would have, but that's for Google to decide (the magic of App Engine).</p><p style="text-align: left;">But here's the thing: App Engine was <i>billed </i>as an instance, not a container.</p><h3 style="text-align: left;">Billing: App Engine vs. Cloud Run</h3><p style="text-align: left;">I'm going to focus on the big thing, which is instance versus container billing. Everything else is comparable (data traffic, etc.).</p><p style="text-align: left;">App Engine billed you on <i>instance</i> time (wall time), normalized to units of the smallest instance class that they support. To run your application at all, there was a 15-minute minimum cost. If your application wasn't doing anything for 15 minutes, then App Engine would shut it down and you wouldn't be billed for that time. But for all other time, anytime that there was at least one instance running, you were billed for that instance (more or less as if you had spun up a VM yourself for the time that it was up).</p><p style="text-align: left;">For a lightweight web application, this kind of billing isn't ideal because most of the work that your application does is a little bit of processing centered around database calls. Most of the time the application is either not doing anything at all (no requests are coming in) or it's waiting on a response from a database. The majority of the instance CPU goes unused.</p><p style="text-align: left;">Cloud Run bills on <i>CPU </i>time, that is, the time that the CPU is <i>actually </i>used. So, for that same application, if it's sitting around doing nothing or waiting on a database response, then it's not being billed. And there's no 15-minute minimum or anything. For example, if your application does some request parsing, permissions checks, etc., and then sits around waits for 1 second for a database query to return, then you'll be billed for like 0.0001 seconds of CPU time, which is great (because your application did very little actual CPU work in that second it took for the request to complete).</p><p style="text-align: left;">Cloud Run gives you basically 48 hours of "vCPU-seconds" (that is, a single CPU churning at 100%) per month. So for an application that sits around doing nothing most of the time, the odds are that you'll never have to pay for any CPU utilization. For a daily average, this comes out to about 1.5 hours of CPU time free per day. Yes, you also pay for the amount of memory that your app uses while it's handling requests, but my app uses like 70MB of memory <i>because it's a web application</i>.</p><p style="text-align: left;">(For me, my app has a nightly jobs that eats up a bunch of CPU, and then it does just about nothing all day.)</p><p style="text-align: left;">Overall, for me, this is what moved me to Cloud Run. I'm billed for the resources that I'm <i>actually </i>using when I'm actually using them, and the specifics around instances and such are no longer my concern.</p><p style="text-align: left;">This also means that I can use a more microservice-based approach, since I'm billed on what my application <i>does</i>, not what resources Google spins up in the background to support it. With App Engine, running separate "services" would have been too costly (each with its own instances, etc.). With Cloud Run, it's perfect, and I can experiment without needing to worry about massive cost changes.</p><h2 style="text-align: left;">What's the catch?</h2><p style="text-align: left;">App Engine's deployment tools are top-notch, even in App Engine v2. When you deploy your application (once you figure out all the right flags), it'll get your application up and running, deploy your cron jobs, and ensure that your Datastore indexes are pushed.</p><p style="text-align: left;">With Cloud Run, you don't get any of the extra stuff.</p><p style="text-align: left;">I've put together a list of things that you'll have to do to migrate your application from App Engine to Cloud Run, sorted from easiest to hardest.</p><h3 style="text-align: left;">Easy: Deploy your application</h3><p style="text-align: left;">This is essentially <span style="font-family: courier; font-size: x-small;">gcloud run deploy</span> with some options, but it's extremely simple and straightforward. Migrating from App Engine will take like 2 seconds.</p><h3>Easy: Get rid of <span style="font-family: courier; font-size: small;">app.yaml</span></h3><p style="text-align: left;">You won't need <span style="font-family: courier; font-size: x-small;">app.yaml</span> anymore, so remove it. If you had environment variables in there, move them to their own file and update your <span style="font-family: courier; font-size: x-small;">gcloud run deploy</span> command to include <span style="font-family: courier; font-size: x-small;">--env-vars-file</span>.</p><p style="text-align: left;">In Cloud Run, everything is HTTPS, so you don't have to worry about insecure HTTP endpoints.</p><p style="text-align: left;">There are no "warmup" endpoints or anything, so just make sure that your container does what it needs to when it starts because as soon as it can receive an HTTP request, it will.</p><p style="text-align: left;">If you had configured a custom instance class, you may need to tweak your Cloud Run service with parameters that better match your old settings. I had only used an F2 class because the F1 class had a limitation on the number of concurrent requests that it could receive, and Cloud Run has no such limitation. You can also configure the maximum number of concurrent requests much, <i>much </i>higher.</p><p style="text-align: left;">Similarly, all of the scaling parameters are... different, and you can deal with those in Cloud Run as necessary. Let your app run for a while and see how Cloud Run behaves; it's definitely a different game from App Engine.</p><h3 style="text-align: left;">Easy: Deploy your task queues</h3><p style="text-align: left;">Starting with App Engine v2, you already had to deploy your own task queues (App Engine v1 used to take care of that for you), so you don't have to change a single thing with <i>deployment.</i> Note that you <i>will </i>have to change how those task queues are used; see below.</p><h3 style="text-align: left;">Easy: Deploy your Datastore indexes</h3><p style="text-align: left;">This is essentially <span style="font-family: courier; font-size: x-small;">gcloud datastore indexes create index.yaml</span> with some options, and you should already have <span style="font-family: courier; font-size: x-small;">index.yaml</span> sitting around, so just add this to your deployment code.</p><h3 style="text-align: left;">Easy: Rip out all "/_ah" endpoints</h3><p style="text-align: left;">App Engine liked to have you put special App Engine things under "/_ah", so I stashed all of my cron job and task queue endpoints in there. Ordinarily, this would be fine, except that Cloud Run quietly treats "/_ah" as special and will quietly fail all requests to any endpoints under it with a permission-denied error. I wasted way too much time trying to figure out what was going on before I realized that it was Cloud Run doing something sneaky and undocumented in the background.</p><p style="text-align: left;">Move any endpoints under "/_ah" to literally any other path and update any references to them.</p><h3 style="text-align: left;">Easy: Create a Dockerfile</h3><p style="text-align: left;">Create a Dockerfile that's as simple or as complex as you want. I wanted a minimal set of changes from my current workflow, so I had my Dockerfile just include all for binaries that get compiled locally as well as any static files.</p><p style="text-align: left;">If you want, you can set up a multi-stage Dockerfile that has separate, short-lived containers for building and compiling that ultimate ends up with a single, small container for your application. I'll eventually get there, but actually <i>deploying </i>into Cloud Run took precedence for me over neat CI/CD stuff.</p><h3 style="text-align: left;">Easy: Detect Cloud Run in production</h3><p style="text-align: left;">I have some simple code that detects if it's in Google Cloud or not so that it does smart production-related things if it is. In particular, logging in production is JSON-based so that StackDriver can pick it up properly.</p><p style="text-align: left;">The old "GAE_*" and "GOOGLE_*" environment variables are gone; you'll only get "K_SERVICE" and friends to tell you a <i>tiny </i>amount of information about your Cloud Run environment. You'll basically get back the service name and that's it.</p><p style="text-align: left;">If you want your project ID or region, you'll have to pull those from the "computeMetadata/v1" endpoints. It's super easy; it's just something that you'll have to do.</p><p style="text-align: left;">For more information on using the "computeMetadata/v1" endpoints, see <a href="https://cloud.google.com/run/docs/securing/service-identity">this guide</a>.</p><p style="text-align: left;">For more information about the environment variables available in Cloud Run, see <a href="https://cloud.google.com/run/docs/container-contract">this document</a>.</p><h3 style="text-align: left;">Easy: "Unlink" your Datastore</h3><p style="text-align: left;">Way back in the day, Datastore was merely a part of App Engine. At some point, Google spun it out into its own thing, but for older projects, it's still tied to App Engine. Make sure that you go to your Datastore's admin screen in Google Cloud Console and <i>unlink </i>it from your App Engine project.</p><p style="text-align: left;">This has no practical effects other than:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>You can no longer put your Datastore in read-only mode.</li><li>You can disable your App Engine project without also disabling your Datastore.</li></ol><p style="text-align: left;">It may take 10-20 minutes for the operation to complete, so just plan on doing this early so it doesn't get in your way.</p><p style="text-align: left;">To learn more about unlinking Datastore from App Engine, see <a href="https://cloud.google.com/datastore/docs/app-engine-requirement">this article</a>.</p><p></p><h3 style="text-align: left;">Medium: Change your task queue code</h3><p style="text-align: left;">If you were using App Engine for task queues, then you were using special App Engine tasks. Since you're not using App Engine anymore, you have to use the more generic HTTP tasks.</p><p style="text-align: left;">In order to do this, you'll need to know the URL for your Cloud Run service. You can <i>technically </i>cheat and use the information in the HTTP request to generally get a URL that (should) work, but I opted to use the "${region}-run.googleapis.com/apis/serving.knative.dev/v1/namespaces/${project-id}/services/${service-name}" endpoint, which returns the Cloud Run URL, among other things. I grabbed that value when my container started and used it anytime I had to create a task.</p><p style="text-align: left;">(And yes, you'll need to get an access token from another "computeMetadata/v1" endpoint in order to access that API.)</p><p style="text-align: left;">For more information on that endpoint, see <a href="https://cloud.google.com/run/docs/reference/rest/v1/namespaces.services/get">this reference</a>.</p><p style="text-align: left;">Once you have the Cloud Run URL, the main difference between an App Engine task and an HTTP task is that App Engine tasks only have the path while the HTTP task has the full URL to hit. I had to change up my internals a bit so that the application knew its base URL, but it wasn't much work.</p><p style="text-align: left;">To learn more about HTTP tasks, see <a href="https://cloud.google.com/tasks/docs/creating-http-target-tasks">this document</a>.</p><div><h3>Medium: Build a custom tool to deploy cron jobs</h3></div><p style="text-align: left;">App Engine used to take care of all the cron job stuff for you, but Cloud Run does not. At the time of this writing, Cloud Run has a beta command that'll create/update cron jobs from a YAML file, but it's not GA yet.</p><p style="text-align: left;">I built a simple tool that calls <span style="font-family: courier; font-size: x-small;">gcloud jobs list</span> and parses the JSON output and compares that to a custom YAML file that I have that describes my cron jobs. Because cron jobs do not refer to an App Engine project, they can reference any HTTP endpoint or a particular Cloud Run service. My YAML file has an option for the Cloud Run service name, and my tool looks up the Cloud Run URL for that service and appends the path onto it.</p><p style="text-align: left;">It's not a lot of work, but it's work that I had to do in order to make my workflows make sense again.</p><p style="text-align: left;">Also note that the human-readable cron syntax ("every 24 hours") is gone and you'll need to use the standard "minute hour day-of-month month day-of-week" syntax that you're used to in most other cron-related things. You can also specify which time zone each cron job should use for interpreting that syntax.</p><p style="text-align: left;">To learn more about the cron syntax, see <a href="https://cloud.google.com/scheduler/docs/configuring/cron-job-schedules">this document</a>.</p><h3 style="text-align: left;">Medium: Update your custom domains</h3><p style="text-align: left;">At minimum, you will now have a completely different URL for your application. If you're using custom domains, you can use Cloud Run's custom domains much like you would use App Engine's custom domains.</p><p style="text-align: left;">However, as far as Google is concerned, this is basically <i>removing </i>a custom domain from App Engine and <i>creating </i>a custom domain in Cloud Run. This means that it'll take anywhere from 20-40 minutes for HTTPS traffic to your domain to work after you make the switch.</p><p style="text-align: left;">Try to schedule this for when you have minimal traffic. As far as I can tell, there's nothing that you can do to speed it up. You may wish to consider using a load balancer instead (and giving the load balancer the TLS certificate responsibilities), but I didn't want to pay extra for a service that's basically free and only hurts me when I need to make domain changes.</p><h3 style="text-align: left;">Medium: Enable compression</h3><p style="text-align: left;">Because App Engine ran behind Google's CDN, compression was handled by default. Cloud Run does no such thing, so if you want compression, you'll have to do it yourself. (Yes, you could pay for a load balancer, but that doesn't reduce your costs by 100%.) Most HTTP frameworks have an option for it, and if you want to roll your own, it's fairly straightforward.</p><p style="text-align: left;">I ended up going with <a href="https://github.com/NYTimes/gziphandler">https://github.com/NYTimes/gziphandler</a>. You can configure a minimum response size before compressing, and if you don't want to compress <i>everything </i>(for example, image files are already compressed internally these days), you have to provide it with a list of MIME types that <i>should </i>be compressed.</p><h3 style="text-align: left;">Hard: Enable caching</h3><p style="text-align: left;">Because App Engine ran behind Google's CDN, caching was handled by default. If you set a <span style="font-family: courier; font-size: x-small;">Cache-Control</span> header, the CDN would take care of doing the job of an HTTP cache. If you did it right, you could reduce the traffic to your application significantly. Cloud Run does no such thing, and your two practical options are (1) pay for a load balancer (which does not reduce your costs by 100%) or (2) roll your own in some way.</p><p style="text-align: left;">There's nothing stopping you from dropping in a Cloud Run service that is basically an HTTP cache that sits in front of your <i>actual </i>service, but I didn't go that route. I decided to take advantage of the fact that my application knew more about caching than an external cache ever could.</p><p style="text-align: left;">For example, my application knows that its static files can be cached <i>forever </i>because the moment that I replace the service with a new version, the cache will die with it and a new cache will be created. For an external cache, you can't tell it, "hey, cache this until the backing service randomly goes away", so you have to set practical cache durations such as "10 minutes" (so that when you do make application changes, you can be reasonably assured that they'll get to your users fairly quickly).</p><p style="text-align: left;">I ended up writing an HTTP handler that parsed the <span style="font-family: courier; font-size: small;">Cache-Control</span> header, cached things that needed to be cached, supported some extensions for the things that my application could know (such as "cache this file forever), and served those results, so my main application code could still generally operate exactly as if there were a real HTTP cache in front of it.</p><h3 style="text-align: left;">Extra: Consider using "alpine" instead of "scratch" for Docker</h3><p style="text-align: left;">I use statically compiled binaries (written in Go) for all my stuff, and I wasted a lot of time using "scratch" as the basis for my new Cloud Run Dockerfile.</p><p style="text-align: left;">The tl;dr here is that the Google libraries really, <i>really </i>want to validate the certificates of the Google APIs, and "scratch" does not install any CA certificates or anything. I was getting nasty errors about things not working, and it took me a long time to realize that Google's packages were upset about not being able to validate the certificates in their HTTPS calls.</p><p style="text-align: left;">If you're using "alpine", just add <span style="font-family: courier; font-size: x-small;">apk add --no-cache ca-certificates</span> to your build.</p><p style="text-align: left;">Also, if you do any kind of time zone work (all of my application's users and such have a time zone, for example), you'll also want to include the timezone data (otherwise, your date/time calls will fail).</p><p style="text-align: left;">If you're using "alpine", just add <span style="font-family: courier; font-size: x-small;">apk add --no-cache tzdata</span> to your build. Also don't forget to set the system timezone to UTC via <span style="font-family: courier; font-size: small;">echo UTC > /etc/timezone</span>.</p><h3 style="text-align: left;">Extra: You may need your project ID and region for tooling</h3><p style="text-align: left;">Once running, your application can trivially find out what its project ID and region are from the "computeMetadata/v1" APIs, but I found that I needed to know these things in advance while running my deployment tools. Your case may differ from mine, but I needed these in advance.</p><p style="text-align: left;">I have multiple projects (for example, a production one and a staging one), and my tooling needed to know the project ID and region for whatever project it was building/deploying. I added this to a config file that I could trivially replace with a copy from a directory of such config files. I just wanted to mention that you might run into something similar.</p><h3 style="text-align: left;">Extra: Don't forget to disable App Engine</h3><p style="text-align: left;">Once everything looks good, <i>disable </i>App Engine in your project. Make sure that you unlinked<i> </i>your Datastore <i>first</i>, otherwise you'll also disable that when you disable App Engine.</p><h3 style="text-align: left;">Extra: Wait a month before celebrating</h3><p style="text-align: left;">While App Engine's quotas reset every day, Cloud Run's reset every <i>month</i>. This means that you won't necessarily be able to prove out your cost savings until you have spent an entire month running in Cloud Run.</p><p style="text-align: left;">After a few days, you should be able to eyeball the data and check the billing reports to get a sense of what your final costs will be, but remember: you get 48 hours of free CPU time before they start charging you, and if your application is CPU heavy, it might take a few days or weeks to burn through those free hours. Even after going through the free tier, the cost of subsequent usage is quite reasonable. However, the point is that you can't draw a trend line on costs based on the first few days of running. Give it a full month to be sure.</p><h3 style="text-align: left;">Extra: Consider optimizing your code for CPU consumption</h3><p style="text-align: left;">In an instance world like App Engine, as long as your application isn't using up <i>all </i>of the CPU and forcing App Engine to spawn more instances or otherwise have requests queue up, CPU performance isn't all that important. Ultimately, it'll cost you the same regardless of whether you can optimize your code to shave off 20% of its CPU consumption.</p><p style="text-align: left;">With Cloud Run, you're charged, in essence, by CPU cycle, so you can drastically reduce your costs by optimizing your code. And if you can keep your stuff efficient enough to stay within the free tier, then your application is basically free. And that's wild.</p><p style="text-align: left;">If you're new to CPU optimization or haven't done it in a while, consider enabling something like "pprof" on your application and then capturing a 30 second workload. You can view the results with a tool like <a href="https://kcachegrind.github.io/html/Home.html">KCachegrind</a> to get a feel for where your application is spending most of its CPU time and see what you can do about cleaning that up. Maybe it's optimizing some for loops; maybe it's caching; maybe it's using a slightly more efficient data structure for the work that you're doing. Whatever it is, find it and reduce it. (And start with the biggest things first.)</p><h2 style="text-align: left;">Conclusion</h2><p style="text-align: left;">Welcome to Cloud Run! I hope your cost savings are as good as mine.</p><div><i>Edit (2022-10-19): added caveats about HTTP compression and caching.</i></div><p></p>Douglas Danger Manleyhttp://www.blogger.com/profile/17044194571403366472noreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-23943942095688971222022-07-10T14:36:00.001-04:002022-07-10T14:36:06.428-04:00Publishing a Docker image for a GitHub repoIt's 2022, and if you're making a GitHub project, chances are that you'll need to publish a Docker image at some point. These days, it's really easy with GitHub CI/CD and their "actions", which generally take care of all of the hard work.<div><br /></div><div>I'm assuming that you already have a working GitHub CI/CD workflow for building whatever it is that you're building, and I'm only going to focus on the Docker-specific changes that need to be made.</div><div><br /></div><div>You'll want to set up the following workflows:</div><div><ol style="text-align: left;"><li>Upon "pull_request" for the "master" branch, build the Docker image (to make sure that the process works), but don't actually publish it.</li><li>Upon "push" for the "master" branch, build the Docker image <i>and </i>publish it as "latest".</li><li>Upon "release" for the "master" branch, build the Docker image <i>and </i>publish it with the release's tag.</li></ol><div>Before you get started, you'll need to create a Dockerhub "access token" to use for your account. You can do this under "Account Settings" → "Security" → "Access Tokens".</div><h2 style="text-align: left;">Workflow updates</h2></div><h4 style="text-align: left;">Upon "pull_request"</h4><div>This workflow happens every time a commit is made to a pull request. While we want to ensure that our Docker image is built so that we know that the process works, we don't actually want to publish that image (at least I don't; you might have a use for it).</div><div><br /></div><div>To build the Docker image, just add the following to your "steps" section:</div><div><div><span style="font-family: courier; font-size: x-small;">- name: Set up Docker Buildx</span></div><div><span style="font-family: courier; font-size: x-small;"> uses: docker/setup-buildx-action@v2</span></div><div><span style="font-family: courier; font-size: x-small;">- name: Build</span></div><div><span style="font-family: courier; font-size: x-small;"> uses: docker/build-push-action@v3</span></div><div><span style="font-family: courier; font-size: x-small;"> with:</span></div><div><span style="font-family: courier; font-size: x-small;"> context: .</span></div><div><span style="font-family: courier; font-size: x-small;"> push: false</span></div><div><span style="font-family: courier; font-size: x-small;"> tags: YOUR_GROUP/YOUR_REPO:latest</span></div></div><div><br /></div><div>Simple enough.</div><div><br /></div><div>The "docker/setup-buildx-action" action does whatever magic needs to happen for Docker stuff to work in the pipeline, and the "docker/build-push-action" builds the image from your Dockerfile and pushes the image. Because we're setting "push: false", it won't actually push.</div><div><h4>Upon "merge" into "master"</h4><div>This workflow happens every time a PR is merged into the "master" branch. In this case, we want to do everything that we did for the "pull_request" case, but we also want to push the image.</div><div><br /></div><div>The changes here are that we'll set "push: true" and also specify our Dockerhub username and password.</div><div><br /></div><div>To build and push the Docker image, just add the following to your "steps" section:</div><div><div><span style="font-family: courier; font-size: x-small;">- name: Set up Docker Buildx</span></div><div><span style="font-family: courier; font-size: x-small;"> uses: docker/setup-buildx-action@v2<br /><div>- name: Login to DockerHub</div><div> uses: docker/login-action@v2</div><div> with:</div><div> username: ${{ secrets.DOCKERHUB_USERNAME }}</div><div> password: ${{ secrets.DOCKERHUB_TOKEN }}</div></span></div><div><span style="font-family: courier; font-size: x-small;">- name: Build</span></div><div><span style="font-family: courier; font-size: x-small;"> uses: docker/build-push-action@v3</span></div><div><span style="font-family: courier; font-size: x-small;"> with:</span></div><div><span style="font-family: courier; font-size: x-small;"> context: .</span></div><div><span style="font-family: courier; font-size: x-small;"> push: true</span></div><div><span style="font-family: courier; font-size: x-small;"> tags: YOUR_GROUP/YOUR_REPO:latest</span></div></div><div><br /></div><div>Boom.</div><div><br /></div><div>The new action "docker/login-action" logs into Dockerhub with your username and password, which is necessary to actually push the image.</div><div><h4>Upon "release"</h4><div>This workflow happens every time a release is created. This is generally similar to "merge" case, except instead of using the "latest" label, we'll be using the release's tag.</div><div><br /></div><div>To build and push the Docker image, just add the following to your "steps" section:</div><div><div><span style="font-family: courier; font-size: x-small;">- name: Set up Docker Buildx</span></div><div><span style="font-family: courier; font-size: x-small;"> uses: docker/setup-buildx-action@v2<br /><div>- name: Login to DockerHub</div><div> uses: docker/login-action@v2</div><div> with:</div><div> username: ${{ secrets.DOCKERHUB_USERNAME }}</div><div> password: ${{ secrets.DOCKERHUB_TOKEN }}</div></span></div><div><span style="font-family: courier; font-size: x-small;">- name: Build</span></div><div><span style="font-family: courier; font-size: x-small;"> uses: docker/build-push-action@v3</span></div><div><span style="font-family: courier; font-size: x-small;"> with:</span></div><div><span style="font-family: courier; font-size: x-small;"> context: .</span></div><div><span style="font-family: courier; font-size: x-small;"> push: true</span></div><div><span style="font-family: courier; font-size: x-small;"> tags: YOUR_GROUP/YOUR_REPO:${{ github.event.release.tag_name }}</span></div></div><div><br /></div><div>And that's it. The "github.event.release.tag_name" variable holds the name of the Git tag, which is what we'll use for the Docker label.</div></div><div><br /></div></div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-89022130672595716472022-05-23T12:50:00.005-04:002022-05-23T12:50:52.811-04:00sed will blow away your symlinks by default<p><span style="font-family: courier;">sed</span> typically outputs to stdout, but <span style="font-family: courier;">sed -i</span> allows you edit a file “in place”. However, under the hood, it actually creates a new file and then replaces the original file with the new file. This means that <span style="font-family: courier;">sed</span> replaces symlinks with normal files. This is most likely <i>not</i> what you want.</p><p>However, there is a flag to pass to make it work the way that you’d expect:</p><p><span style="font-family: courier;">--follow-symlinks</span></p><div>So, if you're using <span style="font-family: courier;">sed -i</span>, then you probably also want to tack on <span style="font-family: courier;">--follow-symlinks</span>, too.</div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-83939365125025840262022-04-16T23:16:00.003-04:002022-04-16T23:16:56.891-04:00Golang, http.Client, and "too many open files"<p>I've been having an issue with my application for a while now, and I finally figured out what the problem was. In this particular case, the application is a web app (so, think REST API written in Go), and one of its nightly routines is to synchronize a whole bunch of data with various third-party ArcGIS systems. The application keeps a cache of the ArcGIS images, and this job updates them so they're only ever a day old. This allows it to show map overlays even if the underlying ArcGIS systems are inaccessible (they're random third-party systems that are frequently down for maintenance).</p><p>So, imagine 10 threads constantly making HTTP requests for new map tile images; once a large enough batch is done, the cache is updated, and then the process repeats until the entire cache has been refreshed.</p><p>In production, I never noticed a direct problem, but there were times when an ArcGIS system would just completely freak out and start lying about not supporting pagination anymore or otherwise spewing weird errors (but again, it's a third-party system, so what can you do?). In development, I would notice this particular endpoint failing after a while with a "dial" error of "too many open files". Every time that I looked, though, everything seemed fine, and I just forgot about it.</p><p>This last time, though, I watched the main application's open sockets ("ss -anp | grep my-application"), and I noticed that the number of connections just kept increasing. This reminded me of my old networking days, and it looked like the TCP connections were just accumulating until the OS felt like closing them due to inactivity.</p><p>That's when I found that Go's "http.Client" has a method called "CloseIdleConnections()" that immediately closes any idle connections without waiting for the OS to do it for you.</p><p>For reasons that are not relevant here, each request to a third-party ArcGIS system uses its own "http.Client", and because of that, there was no way to reuse any connections between requests, and the application just kept racking up open connections, eventually hitting the default limit of 1024 "open files". I simply added "defer httpClient.CloseIdleConnections()" after creating the "http.Client", and everything magically behaved as I expected: no more than 10 active connections at any time (one for each of the 10 threads running).</p><p>So, if your Go application is getting "too many open files" errors when making a lot of HTTP requests, be sure to either (1) re-architect your application to reuse your "http.Client" whenever possible, or (2) be sure to call "CloseIdleConnections()" on your "http.Client" as soon as you're done with it.</p><p>I suspect that some of the third-party ArcGIS issues that I was seeing in production might have essentially been DoS errors caused by my application assaulting these poor servers with thousands of connections.</p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-82152414360925305572022-04-02T23:47:00.002-04:002022-04-02T23:55:03.031-04:00Service workers, push notifications, and IndexedDB<p>I have a pretty simple use case: a user wanted my app to provide a list of "recent" notifications that had been sent to her. Sometimes a lot of notifications will come through in a relatively short time period, and she wants to be able to look at the list of them to make sure that she's handled them all appropriately.</p><p>I ended up having the service worker write the notification to an IndexedDB and then having the UI reload the list of notifications when it receives a "message" event from the service worker.</p><p>Before we get there, I'll walk you through my process because it was painful.</p><h2 style="text-align: left;">Detour: All the mistakes that I made</h2><p style="text-align: left;">Since I was already using HTML local storage for other things, I figured that I would just record the list of recent notifications in there. Every time that the page would receive a "message" event, it would add the event data to a list of notifications in local storage. That <i>kind of worked </i>as long as I was debugging it. As long as I was looking at the page, <i>the page was open</i>, and it would receive the "message" event.</p><p style="text-align: left;">However, in the real world, my application is installed as a "home screen" app on Android and is usually closed. When a notification arrived, there was no open page to receive the "message" event, and it was lost.</p><p style="text-align: left;">I then tried to have the service worker write to HTML local storage instead. It wouldn't matter which side (page or service worker) actually wrote the data since both sides would detect a change immediately. Except that's not how it works. Service workers can't use HTML local storage because of some rules around being asynchronous or something.</p><p style="text-align: left;">Anyway, HTML local storage was impossible as a simple communication and storage mechanism.</p><p style="text-align: left;">Because the page was usually not opened, MessageChannel and BroadcastChannel also wouldn't work.</p><p style="text-align: left;">I finally settled on using IndexedDB because a service worker is allowed to use it. The biggest annoyance (in the design) was that there is no way to have a page "listen" for changes to an IndexedDB, so I couldn't just trivially tell my page to update the list of notifications to display when there was a change to the database.</p><p style="text-align: left;">After implementing IndexedDB, I spent a week trying to figure out why it wasn't working half the time, and that leads us to how service workers actually work.</p><h2 style="text-align: left;">Detour: How service workers work</h2><p style="text-align: left;">Service workers are often <i>described</i> as a background process for your page. The way that you hear about them, they sound like daemons that are always running and process events when they receive them.</p><p style="text-align: left;">But that's not anywhere near correct in terms of how they are implemented. Service workers are more like "serverless" functions (such as Google Cloud Functions) in that they generally aren't running, but if a request comes in that they need to handle, then one is spun up to handle the request, and it'll be kept around for a few minutes in case any other requests come in for it to handle, and then it'll be shut down.</p><p style="text-align: left;">So my big mistake was thinking that once I initialized something in my service worker then it would be available more or less indefinitely. The browser knows what events a service worker has registered ("push", "message", etc.) and can spin up a new worker whenever it wants, typically to handle such an event and then shut it down again shortly thereafter.</p><p style="text-align: left;">Service workers have an "install" event that gets run when <i>new </i>service worker code gets downloaded. This is intended to be run <i>exactly once</i> for that version of the service worker.</p><p style="text-align: left;">There is also an "activate" event that gets run when an <i>actual</i> worker has been assigned to the task. You can basically view this as an event that gets <i>once </i>when a service worker process starts running, regardless of how many times this particular code has been run previously. If you need to initialize some global things for later functions to call, you should do it here.</p><p style="text-align: left;">The "push" event is run when a push message has been received. Whatever work you need to do should be done in the event's "waitUntil" method as a promise chain that ultimately results in showing a notification to the user.</p><h2 style="text-align: left;">Detour: How IndexedDB works</h2><p style="text-align: left;">IndexedDB was seemingly invented by people who had no concept of Promises in JavaScript. Its API is entirely insane and based on "onsuccess", "oncomplete", and "onerror" callbacks. (You can technically also use event listeners, but it's just as insane.) It's an asynchronous API that doesn't use any of the standard asynchronous syntax as anything else in modern JavaScript. It is what it is.</p><p style="text-align: left;">Here's what you need to know: everything in IndexedDB is callbacks. Everything. So, if you want to connect to a database, you'll need to make an IDBRequest and set the "onsuccess" callback. Once you have the database, you'll need to create a transaction and set the "oncomplete" callback. Then you can create another IDBRequest for reading or writing data from an object store (essentially a table) and setting the "onsuccess" callback. It's callback hell, but it is what it is. (Note that there are wrapper libraries that provide Promise-based syntax, but I hate having to wrap a standard feature for no good reason.)</p><p style="text-align: left;">(Also, there's an "onupgradeneeded" callback at the database level that you can use to do any schema- or data-related work if you're changing the database version.)</p><h2 style="text-align: left;">Putting it all together</h2><p style="text-align: left;">I decided that there was no reason to waste cycles opening the IndexedDB on "activate" since there's no guarantee that it'll actually be used. Instead, I had the "push" event use the previous database connection (if there was one) or create a new connection (if there wasn't).</p><p style="text-align: left;">I put together the following workflow for my service worker:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Register the "push" event handler ("event.waitUntil(...)"):</li><ol><li>(Promise) Connect to the IndexedDB.</li><ol><li>If we already have a connection from a previous call, then return that.</li><li>Otherwise, connect to the IndexedDB and return that (and also store it for quick access the next time so we don't have to reconnect).</li></ol><li>(Promise) Read the list of notifications from the database.</li><li>(Promise) Add the new notification to the list and write it back to the database.</li><li>(Promise) Fire a "message" event to all active pages and show a notification if no page is currently visible to the user.</li></ol></ol><div>And for my page:</div><div><ol style="text-align: left;"><li>Load the list of notifications from the IndexedDB when the page loads. (This sets our starting point, and any changes will be communicated by a "message" event from the service worker.)</li><li>Register the "message" event handler:</li><ol><li>Reload the list of notifications from the IndexedDB. (Remember, there's no way to be notified on changes, so receiving the "message" event and reloading is the best that we can do.)</li><li>(Handle the message normally; for me, this shows a little toast with the message details and a link to click on to take the user to the appropriate screen.)</li></ol></ol></div><p style="text-align: left;">For me, the database work is a nice-to-have; the notification is the critical part of the workflow. So I made sure that every database-related error was handled and the Promises resolved no matter what. This way, even if there was a completely unexpected database issue, it would just get quietly skipped and the notification could be shown to the user.</p><p style="text-align: left;">In my code, I created some simple functions (to deal with the couple of IndexedDB interactions that I needed) that return Promises so I could operate normally. You could technically just do a single "new Promise(...)" to cover all of the IndexedDB work if you wanted, or you could one of those fancy wrapper libraries. In any case, you <i>must </i>call "event.waitUntil" with a Promise chain that ultimately resolves after doing something with the notification. How you get there is up to you.</p><p style="text-align: left;">I also was using the IndexedDB as an asynchronous local storage, so I didn't need fancy keys or sorting or anything. I just put all of my data under a single key that I could "get" and "put" trivially without having to worry about row counts or any other kind of data management. There's a single object store with a single row in it.</p><p></p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-8728193604348382412022-03-03T00:03:00.002-05:002022-03-03T00:03:34.100-05:00Dump your SSH settings for quick troubleshooting<p>I recently had a Jenkins job that would die, seemingly-randomly. The only thing that really stood out was that it would tend to succeed if the runtime was 14 minutes or less, and it would tend to fail if the runtime was 17 minutes or more.</p><p>This job did a bunch of database stuff (through an SSH tunnel; more on that soon), so I first did a whole bunch of troubleshooting on the Postgres client and server configs, but nothing seemed relevant. It seemed to disconnect ("connection closed by server") on these long queries that would sit there for a long time (maybe around 15 minutes or so) and then come back with a result. After ruling out the Postgres server (all of the settings looked good, and new sessions had decent timeout configs), I moved on to SSH.</p><p>This job connects to a database by way of a forwarded port through an SSH tunnel (don't ask why; just understand that it's the least worst option available in this context). I figured that maybe the SSH tunnel was failing, since I start it in the background and have it run "sleep infinity" and then never look at it again. However, when I tested locally, my SSH session would run for multiple days without a problem.</p><p>Spoiler alert: the answer ended up being the client config, but how do you actually find that out?</p><p>SSH has two really cool options.</p><p>On the <i>server </i>side, you can run "sudo sshd -T | sort" to have the SSH daemon read the relevant configs and then print out all of the actual values that it's using. So, this'll merge in all of the unspecified defaults as well as all of the various options in "/etc/sshd_config" and "/etc/sshd_config.d", etc.</p><p>On the <i>client </i>side, you can run "ssh -G ${user}@${host} | sort", and it'll do the same thing, but for all of the client-side configs for that particular user and host combination (because maybe you have some custom stuff set up in your SSH config, etc.).</p><p>Now, in my case, it ended up being a keepalive issue. So, on the server side, here's what the relevant settings were:</p><p><span style="font-family: courier; font-size: x-small;">clientalivecountmax 0<br />clientaliveinterval 900<br />tcpkeepalive yes</span></p><p style="text-align: left;">On the client (which would disconnect sometimes), here's what the relevant settings were:</p><p><span style="font-family: courier; font-size: x-small;">serveralivecountmax 3<br />serveraliveinterval 0<br />tcpkeepalive yes</span></p><p style="text-align: left;">Here, you can see that the client (which is whatever the default Jenkins Kubernetes agent ended up being) enabled a TCP keepalive, but it set the keepalive interval to "0", which means that it wouldn't send keepalive packets at all.</p><p style="text-align: left;">According to the docs, the server <i>should </i>have sent out keepalives every 15 minutes, but whatever it was doing, the connection would drop after 15 minutes. Setting "serveraliveinterval" to "60" ended up solving my problem and allowed my SSH sessions to stay active indefinitely until the script was done with them.</p><h2 style="text-align: left;">Little bonus section</h2><p style="text-align: left;">My SSH command to set up the tunnel in the background was:</p><p style="text-align: left;"><span style="font-family: courier; font-size: x-small;">ssh -4 -f -L${localport}:${targetaddress}:${targetport} ${user}@${bastionhost} 'sleep infinity';</span></p><p style="text-align: left;">"-4" forces it to use an IPv4 address (relevant in my context), and "-f" puts the SSH command into the background before "sleep infinity" gets called, right after all the port forwarding is set up. "sleep infinity" ensures that the connection never closes on its own; the "sleep" command will do nothing forever.</p><p style="text-align: left;">(Obviously, I had the "-o ServerAliveInterval=60" option in there, too.)</p><p style="text-align: left;">With this, I could trivially have my container create an SSH session that allowed for port-forwarding, and that session would be available for the entirety of the container's lifetime (the entirety of the Jenkins build).</p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-73689638305725250852022-03-01T15:19:00.002-05:002022-03-01T15:19:13.832-05:00QNAP, NFS, and Filesystem ACLs<p>I recently spent hours banging my head against a wall trying to figure out why my Plex server couldn't find some new media that I put on its volume in my QNAP.</p><p>tl;dr QNAP "Advanced Folder Permissions" turn on file access control lists (you'll need the "getfacl" and "setfacl" tools installed on Linux to mess with them). For more information, see <a href="https://www.qnap.com/en/how-to/faq/article/how-to-configure-sub-folders-acl-for-nfs-clients">this guide from QNAP</a>.</p><p>I must have turned this setting on when I rebuilt my NAS a while back, and it never mattered until I did some file operations with the File Manager or maybe just straight "cp"; I forget which (or both). Plex refused to see the new file, and I tried debugging the indexer and all that other Plex stuff before realizing that while it could <i>list </i>the file, it couldn't <i>open </i>the file, even though its normal "ls -l" permissions looked fine.</p><p>Apparently the file access control list denied it, but I didn't even have "getfacl" or "setfacl" installed on my machine (and I had never even heard of this before), so I had no idea what was going on. I eventually installed those tools and verified that while the standard Linux permission looked fine, the ACL permissions did not.</p><p>"sudo chmod -R +r /path/to/folder" didn't solve my problem, but tearing out the ACL did: "sudo setfacl -b -R /path/to/folder"</p><p>Later, I eventually figured out that it was QNAP's "Advanced Folder Permissions" and just disabled that so it wouldn't bother me again.</p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-42816701758833012212022-01-09T19:59:00.002-05:002022-01-09T19:59:30.762-05:00Moving (or renaming) a push-notification ServiceWorker<p>Service Workers (web workers, etc.) are a relatively new concept. They can do all kinds of cool things (primarily related to network requests), but they are also the mechanism by which a web site can receive push messages (via Web Push) and show them as OS notifications.</p><p>The general rule of Service Workers is to pick a file name (such as "/service-worker.js") and never, ever change it. That's cool, but sometimes you do need to change it.</p><p>In particular, I started my push messaging journey with <a href="https://github.com/googlearchive/platinum-push-messaging">"platinum-push-messaging"</a>, a now-defunct web component built by Google as part of the initial Polymer project. The promise was cool: just slap this HTML element on your page with a few parameters and boom: you have working push notifications.</p><p>When it came out, the push messaging spec was young, and no browsers fully supported its encrypted data payloads, so "platinum-push-messaging" did a lot of work to work around that limitation. As browsers improved to support the VAPID spec, "platinum-push-messaging" (along with all of the other "platinum" elements) were quietly deprecated and archived (around 2017).</p><p>This left me with a problem: a rotting push notification system that couldn't keep up with the spec and the latest browsers. I hacked the code to all hell to support VAPID and keep the element functioning, but I was just punting.</p><p>Apple ruined the declarative promise of the Polymer project by refusing to implement HTML imports, so the web components community adopted the NPM distribution model (and introduced a whole bunch of imperative Javascript drama and compilation tools). Anyway, no modern web components are installed with Bower anymore, so that left me with a deprecated Service Worker in a path that I wanted to get rid of: "bower_components/platinum-push-messaging/service-worker.js"</p><p>Here was my problem:</p><p></p><ol style="text-align: left;"><li>I wanted the push messaging Service Worker under my control at the top level of my application, "/push-service-worker.js".</li><li>I had hundreds of users who were receiving push notifications via this system, and the upgrade had to be seamless (users couldn't be forced to take any action).</li></ol><div>I ended up solving the problem by essentially performing a switcheroo:</div><div><ol style="text-align: left;"><li>I had my application store the Web Push subscription info in HTML local storage. This would be necessary later as part of the switcheroo.</li><li>I removed "bower_components/platinum-push-messaging/". Any existing clients would regularly attempt to update the service worker, but it would quietly fail, leaving the existing one running just fine.</li><li>I removed all references to "platinum-push-messaging" from my code. The existing Service Worker would continue to run (because that's what Service Workers do) and receive push messages (and show notifications).</li><li>I made my own push-messaging web component with my own service worker living at "/push-service-worker.js".</li><li>(This laid the framework for performing the switcheroo.)</li><li>Upon loading, the part of my application that used to include "platinum-push-messaging" did a migration, if necessary, <i>before</i> loading the new push-messaging component:</li><ol><li>It went through all the Service Workers and looked for any legacy ones (these had "$$platinum-push-messaging$$" in the scope). If it found any, it killed them.<br /><br />Note that the "$$platinum-push-messaging$$" in the scope was a cute trick by the web component: a page can only be controlled by one Service Worker, and the scope dictates what that Service Worker can control. By injecting a bogus "$$platinum-push-messaging$$" at the end of the scope, it ensured that the push-messaging Service Worker couldn't accidentally control any pages and get in the way of a main Service Worker.</li><li>Upon finding any legacy Service Workers, it would:</li><ol><li>Issue a delete to the web server for the old (legacy) subscription (which was stored in HTML local storage).</li><li>Tell the application to auto-enable push notifications.</li><li>Resume the normal workflow for the application.</li></ol></ol><li>The normal workflow for the application entailed loading the new push-messaging web component once the user was logged in. If a Service Worker was previously enabled, then it would remain active and enabled. Otherwise, the application wouldn't try to annoy users by asking them for push notifications.</li><li>After the new push-messaging web component was included, it would then check to see if it should be auto-enabled (it would only be auto-enabled as part of a successful migration).</li><ol><li>If it was auto-enabled, then it would enable push messaging (the user would have already given permission by virtue of having a legacy push Service Worker running). When the new push subscription was ready, it would post that information to the web server, and the user would have push messages working again, now using the new Service Worker. The switcheroo was complete.</li></ol></ol><div>That's a bit wordy for a simple switcheroo, but it was very important for me to ensure that my users never lost their push notifications as part of the upgrade. The simple version is: detect legacy Service Worker, kill legacy Service Worker, delete legacy subscription from web server, enable new Service Worker, and save new subscription to web server.</div></div><p></p><div>For any given client, the switcheroo happens only once. The moment that the legacy Service Worker has been killed, it'll never run again (so there's a chance that if the user killed the page in the milliseconds after the kill but before the save, then they'd lose their push notifications, but I viewed this as extremely unlikely; I could technically have stored a status variable, but it wasn't worth it). After that, it operates normally.</div><p></p><div>This means that there are one of two ways for a user to be upgraded:</div><div><ol style="text-align: left;"><li>They open up the application after it has been upgraded. The application prompts them to reload to upgrade if it detects a new version, but <i>eventually </i>the browser will do this on its own, typically after the device reboots or the browser has been fully closed.</li><li>They click on a push notification, which opens up the application (which is #1, above).</li></ol><div>So at this point, it's a waiting game. I have to maintain support for the switcheroo until all existing push subscriptions have been upgraded. The new ones have a flag set in the database, so I just need to wait until all subscriptions have the flag. Active users who are receiving push notifications will <i>eventually </i>click on one, so I made a note to revisit this and remove the switcheroo code once all of the legacy subscriptions have been removed.</div><p></p></div><div>I'm not certain what causes a new subscription to be generated (different endpoint, etc.), but I suspect that it has to do with the scope of the Service Worker (otherwise, how would it know, since service worker code can change frequently?). I played it safe and just assumed that the switcheroo would generate an entirely new subscription, so I deleted the legacy one no matter what and saved the new one no matter what.</div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-27581455139235399182021-10-30T17:36:00.005-04:002021-10-30T17:37:24.278-04:00Troubleshooting a weird Nagios NRPE SSL/TLS error<p>We recently gained limited access to a customer data center in order to monitor to some machines that our software is running on. For historical reasons, we use Nagios as our monitoring tool (yes, I know that it's 2021) and we use NRPE to monitor our Linux boxes (yes, I know that NRPE is deprecated in favor of NCPA).</p><p>We had to provide the customer with a list of source IP addresses and target ports (for example, 5666 for NRPE) as part of the process to get the VPN set up. <i>Foreshadowing: this will become relevant soon.</i></p><p>After getting NRPE installed on all of our machines, we noticed that Nagios was failing to connect to any of the them. The NRPE logs all had the following errors:</p><p><span style="font-size: x-small;"><span style="font-family: courier;">Starting up daemon<br /></span><span style="font-family: courier;">Server listening on 0.0.0.0 port 5666.<br /></span><span style="font-family: courier;">Server listening on :: port 5666.<br /></span><span style="font-family: courier;">Warning: Daemon is configured to accept command arguments from clients!<br /></span><span style="font-family: courier;">Listening for connections on port 5666<br /></span><span style="font-family: courier;">Allowing connections from: 127.0.0.1,::1,[redacted]<br /></span><span style="font-family: courier;">Error: Network server getpeername() failure (107: Transport endpoint is not connected)<br /></span><span style="font-family: courier;">warning: can't get client address: Connection reset by peer<br /></span><span style="font-family: courier;">Error: (!log_opts) Could not complete SSL handshake with [redacted]: 5<br /></span><span style="font-family: courier;">warning: can't get client address: Connection reset by peer<br /></span><span style="font-family: courier;">Error: Network server getpeername() failure (107: Transport endpoint is not connected)<br /></span><span style="font-family: courier;">Error: Network server getpeername() failure (107: Transport endpoint is not connected)<br /></span><span style="font-family: courier;">warning: can't get client address: Connection reset by peer<br /></span><span style="font-family: courier;">Error: (!log_opts) Could not complete SSL handshake with [redacted]: 5<br /></span><span style="font-family: courier;">warning: can't get client address: Connection reset by peer<br /></span><span style="font-family: courier;">warning: can't get client address: Connection reset by peer</span></span></p><p>So, this is obviously an SSL/TLS problem.</p><p>However, everyone on the Internet basically says that this is a problem with the NRPE client machine (the Nagios source address isn't listed in "allowed_hosts", it's not set up for SSL correctly, you didn't compile it right, etc.).</p><p>After fighting with this for hours, we finally figured out what was wrong.</p><p>A hint was the "getpeername() failure"; if you open up the NRPE source code, this runs immediately after the connection is established. The only way that you could see this error ("Transport endpoint is not connected") is if the socket was closed between that initial connection and "getpeername".</p><p>Running "tcpdump" on both sides yielded the following findings:</p><p>On Nagios:</p><p></p><blockquote><span style="color: #999999;">Nagios → NRPE machine: SYN<br />NRPE machine → Nagios: SYN, ACK<br />Nagios → NRPE machine: ACK<br /></span>Nagios → NRPE machine: TLSv1 Client Hello<br />NRPE machine → Nagios: RST, ACK</blockquote><p></p><p>On the NRPE machine to be monitored:</p><p></p><blockquote><p><span style="color: #999999;">Nagios → NRPE machine: SYN<br />NRPE machine → Nagios: SYN, ACK<br />Nagios → NRPE machine: ACK<br /></span>Nagios → NRPE machine: RST, ACK</p><p></p></blockquote><p>Both machines agreed on the first 3 packets: the classic TCP handshake. However, they differed on the subsequent packets. Nagios sent a TLSv1 "Client Hello" packet and immediately had the connection closed by the NRPE machine. However, the NRPE machine did not see the TLSv1 "Client Hello" at all; rather, it saw that Nagios immediately closed the connection.</p><p>This is indicative of some trickery being done by the customer's equipment (firewall, VPN, etc.). From what I can tell, they're quietly stripping out any TLS packets and killing the connection if it finds any. They probably have an incorrect port rule set up for port 5666, but anyway, that's the problem here: the network infrastructure is tearing out the TLS packets and closing the connection.</p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-42519413084001600422021-07-10T15:50:00.001-04:002021-07-10T15:53:00.877-04:00Migrating from a static volume to a storage pool in QNAP<p> I bought a QNAP TS-451+ NAS a number of years ago. At the time, you could only set up what are now called "static volumes"; these are volumes that are composed of a number of disks in some RAID configuration. After a firmware update, QNAP introduced "storage pools", which act as a layer in between the RAIDed disks and the volumes on top of them. Storage pools can do snapshots and some other fancy things, but the important thing here is that QNAP was pushing storage pools now, and I had a static volume.</p><p>I wanted to migrate from my old static volume to a new storage pool. I couldn't really find any examples of anyone who had performed such a migration successfully; most of the advice on the Internet was basically, "back up your stuff and reformat". Given the fact that my volume was almost full and that QNAP does not support an in-place migration, I figured that if I added on some extra storage in the form of an expansion unit, I could probably pull it off with minimal hassle.</p><p>(<a href="https://www.qnap.com/en-us/how-to/knowledge-base/article/can-i-convert-a-static-volume-to-thin-or-thick">The official QNAP docs</a> generally agree with this.)</p><p>tl;dr It was pretty easy to do, just a bit time-consuming. I'll also note that this was a lossless process (other than my NFS permissions); I didn't have to reinstall anything or restore any backups.</p><p>Here's the general workflow:</p><p></p><ol style="text-align: left;"><li>Attach the expansion unit.</li><li>Add the new disks to the expansion unit.</li><li>Create a new storage pool on the expansion unit.</li><li>Transfer each folder in the original volume to a new folder on the expansion unit.</li><li>Write down the NFS settings for the original volume's folders.</li><li>Delete the original volume.</li><li>Create a new storage pool with the original disks.</li><li>Create a new system volume on the main storage pool.</li><li>Create new volumes as desired on the main storage pool.</li><li>Transfer each folder from the expansion volume to the main volume.</li><li>Re-apply the NFS settings on the folders on the main storage pool's volumes.</li><li>Detach the expansion unit.</li></ol><div>Some details follow.</div><div><br /></div><div>QNAP sells expansion units that can act as additional storage pools and volumes, and the QNAP OS integrates them pretty well. I purchased a TS-004 and connected it to my TS-451+ NAS via USB. I had some new drives that I was planning to use to replace the drives currently in the NAS, so instead of doing that right away, I put them all in the expansion unit and created a new storage pool (let's call this the expansion storage pool).</div><div><br /></div><div>I had originally tried using File Station to copy and paste all of my folders to a new volume in the expansion unit, but I would get permission-related errors, and I didn't want to deal with individual files when there were millions to transfer. QNAP has an application called Hybrid Backup Sync, and one of the things that you can do is a 1-way sync "job" that lets you properly copy everything from one folder on one volume to another folder on another volume. So I created new top-level folders in the expansion volume and then used Hybrid Backup Sync to copy all of my data from the main volume to the expansion volume (it preserved all the file attributes, etc.).</div><div><br /></div><div>For more information how to use Hybrid Backup Sync to do this, see <a href="https://www.qnap.com/en/how-to/knowledge-base/article/how-to-move-shared-folders-to-a-new-volume">this article from QNAP</a>.</div><div><br /></div><div>(If you're coming from a static volume and you set up a storage pool on the expansion unit, then QNAP has a feature where you can transfer a folder on a static volume to a new volume in a storage pool, but this only works one way; you can't use this feature to transfer back from storage pool to storage pool, only from static volume to storage pool.)</div><div><br /></div><div>I then wrote down the NFS settings that I had for my folders on the main unit (it's pretty simple, but I did have some owner and whitelist configuration).</div><div><br /></div><div>Once I had everything of mine onto the expansion volume, I then deleted the main (system) volume. QNAP was okay with this and didn't complain at all. Some sites that I had read claimed that you'd have to reboot or reformat or something if you did this, but at least on modern QNAP OSes, it's fine with you deleting its system volume.</div><div><br /></div><div>For more information on deleting a volume, see <a href="https://www.qnap.com/en/how-to/knowledge-base/article/how-to-remove-a-storage-poolvolume">this article from QNAP</a>.</div><div><br /></div><div>I created a new storage pool with the main unit's existing disks, and then I created a small, thin volume on it to see what would happen. QNAP quickly decided that this new volume would be the new "system" volume, and it installed some applications on its own, and then it was done. My guess is that it installed whatever base config it needs to operate on that new volume and maybe transferred the few applications that I already had to it or something.</div><div><br /></div><div>(I then rebooted the QNAP just to make sure that everything was working, and it ended up being fine.)</div><div><br /></div><div>On the expansion unit, I renamed all of the top-level folders to end with "_expansion" so that I'd be able to tell them apart from the ones that I would make on the main unit.</div><div><br /></div><div>Then I used Hybrid Backup Sync to copy my folders from the expansion volume to the main volume. Once that was done, I modified the NFS settings on the main volume's folders to match what they had been originally.</div><div><br /></div><div>I tested the connections from all my machines that use the NAS, and then I detached and powered down the expansion unit. I restarted the NAS and tested the connections again, and everything was perfect. Now I had a storage pool with thin-provisioned volumes instead of a single, massive static volume.</div><p></p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-23579290094646843322021-07-05T17:47:00.001-04:002021-07-05T17:49:04.826-04:00Working around App Engine's bogus file modification times in Go<p>When an <a href="https://cloud.google.com/appengine">App Engine</a> application is deployed, the files on the filesystem have their modification times "zeroed"; in this case, they are set to Tuesday, January 1, 1980 at 00:00:01 GMT (with a Unix timestamp of "315532801"). Oddly enough, this isn't January 1, 1970 (with a Unix timestamp of "0"), so they're adding 1 year and 1 second for some reason (probably to avoid actually zeroing out the date).</p><p>If you found your way here by troubleshooting, you may have seen this for your "Last-Modified" header:</p><p><span style="font-family: courier; font-size: x-small;">last-modified: Tue, 01 Jan 1980 00:00:01 GMT</span></p><p>There's an issue for this particular problem (currently they're saying that it's working as designed); to follow the issue or make a comment, see <a href="https://issuetracker.google.com/issues/168399701">issue 168399701</a>.</p><p>For App Engine in Go, I've historically bypassed the static files stuff and just had my application serve up the files with "<a href="https://golang.org/pkg/net/http/#FileServer">http.FileServer</a>", and I've disabled caching everywhere to play it safe ("Cache-Control: no-cache, no-store, must-revalidate"). Recently, I've begun to experiment with a "max-age" of 1-minute lined up on 1-minute boundaries so that I get a bit of help from the GCP proxy and its caching powers while not shooting myself in the foot allowing stale copies of my files to linger all over the Internet.</p><p>This caused me a huge amount of headache recently when my web application wasn't updating in production, despite being pushed for over 24 hours. It turns out that the browser (Chrome) was making a request by including the "If-Modified-Since" header, and my application was responding back with a <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/304">304 Not Modified</a> response. No matter how many times my service worker tried to fetch the new data, the server kept telling it that what it had was perfect.</p><p>The default HTTP file server in some languages lets you tweak how it responds ("ETag", "Last-Modified", etc.), but not in Go. "http.FileServer" has no configuration options available to it.</p><p>What I ended up doing was wrapping "http.FileServer"'s "ServeHTTP" in another function; this function had two main goals:</p><p></p><ol style="text-align: left;"><li>Set up a weak ETag value using the expiration date (ideally, I'd use a strong value like the MD5 sum of the contents, but I didn't want to have to rewrite "http.FileServer" just for this).</li><li>Remove the request headers related to the modification time ("If-Modified-Since" and "If-Unmodified-Since"). "http.FileServer" definitely respects "If-Modified-Since", and because the modification time is bogus in App Engine, I figured that just quietly removing any headers related to that would keep things simple.</li></ol><div>Here's what I ended up with:</div><p></p><div><div><span style="font-family: courier; font-size: x-small;">staticHandler := http.StripPrefix("/", http.FileServer(http.Dir("/path/to/my/files")))</span></div><div><span style="font-family: courier; font-size: x-small;"><br /></span></div><span style="font-family: courier; font-size: x-small;">myHandler.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {</span></div><div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>// Cache all the static files aligned at the 1-minute boundary.</span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>expirationTime := time.Now().Truncate(1 * time.Minute).Add(1 * time.Minute)</span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>w.Header().Set("Cache-Control", fmt.Sprintf("public, max-age=%0.0f, must-revalidate", time.Until(expirationTime).Seconds()))</span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>w.Header().Set("ETag", fmt.Sprintf("W/\"exp_%d\"", expirationTime.Unix())) // The ETag is weak ("W/" prefix) because it'll be the same tag for all encodings.</span></div><div><span style="font-family: courier; font-size: x-small;"><br /></span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>// Strip the headers that `http.FileServer` will use that rely on modification time.</span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>// App Engine sets all of the timestamps to January 1, 1980.</span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>r.Header.Del("If-Modified-Since")</span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>r.Header.Del("If-Unmodified-Since")</span></div><div><span style="font-family: courier; font-size: x-small;"><br /></span></div><div><span style="font-family: courier; font-size: x-small;"><span style="white-space: pre;"> </span>staticHandler.ServeHTTP(w, r)</span></div><div><span style="font-family: courier; font-size: x-small;">})</span></div></div><div><br /></div><div>Anyway, I fought with this for two days before finally realizing what was going on, so hopefully this will let you work around App Engine's bogus file-modification times.</div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-51117811609623862192021-04-15T16:27:00.001-04:002021-04-15T16:27:12.635-04:00Using "errors.Is" to detect "connection reset by peer" and work around it<p> I maintain an application that ties into <a href="https://www.emergencyreporting.com">Emergency Reporting</a> using their REST API. When an item is updated, I have a <a href="https://cloud.google.com/tasks/">Google Cloud Task</a> that attempts to publish a change to a web hook, which connects to the Emergency Reporting API and creates a new incident in that system. Because it's in Cloud Tasks, if the task fails for any reason, Cloud Tasks will attempt to retry the task until it succeeds. Cool.</p><p>I also have it set up to send any log messages at warning level or higher to a <a href="https://slack.com/">Slack</a> channel. Also cool.</p><p>However, in December of 2020, Emergency Reporting switched to some kind of Microsoft-managed authentication system for their API, and this has only brought problems. The most common of which is that the authentication API will frequently fail with a "connection reset by peer" error. My Emergency Reporting wrapper detects this and logs it; my web hook detects a sign-in failure and logs that; and the whole Cloud Task detects that the web hook has failed and logs that. Cloud Tasks automatically retries the task, which makes another post to the web hook, and everything succeeds the second time. But by now, I've accumulated a bunch of warnings in the Slack channel. Not cool.</p><p>So here's the thing: the Emergency Reporting API can fail for a lot of reasons, and I'd like to be notified when something important actually happens. But a standard, run-of-the-mill TCP "connection reset by peer" error is not important at all.</p><p>Here's an example of the kind of error that Go's <a href="https://golang.org/pkg/net/http/#Client.PostForm">http.Client.PostForm</a> returns in this case:</p><p><span style="font-family: courier; font-size: x-small;">Could not post form: Post https://login.emergencyreporting.com/login.emergencyreporting.com/B2C_1A_PasswordGrant/oauth2/v2.0/token: read tcp [fddf:3978:feb1:d745::c001]:33391->[2620:1ec:29::19]:443: read: connection reset by peer</span></p><p style="text-align: left;">Looking at the error, it looks like there are 4 layers of error:</p><ol style="text-align: left;"><li>The HTTP post</li><li>The particular TCP read</li><li>A generic "read"</li><li>A generic "connection reset by peer"</li></ol>What I really want to do in this case is detect a generic "connection reset by peer" error and quietly retry the operation, allowing all other errors to be handled as true errors. Doing string-comparison operations on error text is rarely a good idea, so what does that leave us with?<p></p><p style="text-align: left;"><a href="https://golang.org/doc/go1.13#error_wrapping">Go 1.13</a> adds support for "error wrapping", where one error can "wrap" another one, while still allowing programs to make decisions based on the "wrapped" error. You may call "<a href="https://golang.org/pkg/errors/#Is">errors.Is</a>" to determine if any error in an error chain matches a particular target.</p><p style="text-align: left;">Fortunately, all of the packages in this particular chain of errors utilize this feature. In particular, the <a href="https://golang.org/pkg/syscall/">syscall</a> package has a set of distinct <a href="https://golang.org/pkg/syscall/#Errno">Errno</a> errors for each low-level error, including "connection reset by peer" (ECONNRESET).<br /></p><p style="text-align: left;">This lets us do something like this:</p><p><span style="font-family: courier; font-size: x-small;">tokenResponse, err = client.GenerateToken()<br />if err != nil {<br /> // If this was a connection-reset error, then continue to the next retry.<br /> if errors.Is(err, syscall.ECONNRESET) {<br /> logrus.Info("Got back a syscall.ECONNRESET from Emergency Reporting.")<br /> // [attempt to retry the operation]<br /> } else {<br /> // This was some other kind of error that we can't handle.<br /> // [log a proper error message and fail]<br /> }<br />}</span></p><p>Since using "errors.Is" to detect the "connection reset by peer" error, I haven't received a single annoying, pointless error message in my Slack channel. I did have to spend a bit of time trying to figure out what that ultimate, underlying error was, but after that, it's been working flawlessly.</p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-44054987897420669422021-01-25T15:12:00.001-05:002021-01-25T15:12:29.448-05:00Using LDAP groups to limit access to a Radius server (freeRADIUS 3.0)<p><i>Note: this is an updated version of <a href="https://blog.sensecodons.com/2020/12/using-ldap-groups-to-limit-to-radius.html">a prior entry</a> for freeRADIUS 3.0.</i></p><p>Anytime I need to create a VPN (to my home network, to my AWS network, etc.), I use <a href="https://www.softether.org/">SoftEther</a>. SoftEther is OpenVPN-compatible, supports L2TP/IPsec, and has some neat settings around VPN over ICMP and DNS. Anyway, once you get it set up, it generally just works (except for the cronjob that you need to make to trim its massive log files daily).</p><p>At work, we use LDAP for our user authentication and permissions, but SoftEther doesn't support LDAP. It does, however, support Radius, and <a href="https://freeradius.org/">freeRADIUS</a> supports using LDAP as a module, so you can easily set up a quick Radius proxy for LDAP.</p><h2>Quick recap on setting up freeRADIUS with LDAP</h2><div>I'm assuming that you already have an LDAP server.</div><div><br /></div><div>Install freeRADIUS and the LDAP module.</div><div><span style="font-family: courier; font-size: x-small;">sudo apt install freeradius freeradius-ldap</span></div><div><span style="font-family: courier; font-size: x-small;">sudo systemctl enable freeradius</span></div><div><span style="font-family: courier; font-size: x-small;">sudo systemctl start freeradius</span></div><div><br /></div><div>Enable the LDAP module via symlink:</div><div><span style="font-family: courier; font-size: x-small;">ln -sfn ../mods-available/ldap /etc/freeradius/3.0/mods-enabled/ldap</span></div><div><br /></div><div>Then turn on the LDAP module by editing <span style="font-family: courier; font-size: x-small;">/etc/freeradius/3.0/sites-enabled/default</span> and uncommenting the "ldap" line under the "authorize" block.</div><div><span style="font-family: courier; font-size: x-small;">authorize {</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><span style="font-family: courier; font-size: x-small;"> ldap</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><br /></div><div>You'll need to add an "if" statement to set the "Auth-Type"; do this immediately after that "ldap" line.</div><div><div><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: x-small;">if ((ok || updated) && User-Password) {</span></div><div><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;">update {</span></div><div><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;">control:Auth-Type := ldap</span></div><div><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: small;">}</span></div><div><span style="font-family: courier; font-size: small;"> </span><span style="font-family: courier; font-size: x-small;">}</span></div></div><div><br /></div><div>And the same for the "Auth-Type LDAP" block.</div><div><span style="font-family: courier; font-size: x-small;">authorize {</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><span style="font-family: courier; font-size: x-small;"> Auth-Type LDAP {</span></div><div><span style="font-family: courier; font-size: x-small;"> ldap</span></div><div><span style="font-family: courier; font-size: x-small;"> }</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><br /></div><div>Cool; at this point, freeRADIUS will use whatever LDAP setup is in the <span style="font-family: courier; font-size: x-small;">/etc/freeradius/3.0/mods-enabled/ldap</span> file. It won't work (because it's not set up for your LDAP server), that's all that you need in order to back your Radius server with your LDAP server.</div><div><br /></div><div>Next up, we'll look at configuring it to actually talk to your LDAP server.</div><h2>Configuring the LDAP module</h2><div><span style="font-family: courier; font-size: x-small;">/etc/freeradius/3.0/mods-enabled/ldap</span> is where the LDAP configuration lives. In order to understand exactly what's going on, you should know a few things.</div><div><ol><li><a href="https://wiki.freeradius.org/config/run_time_variables">Run-time variables</a>, like the current user name, are written as <span style="font-family: courier; font-size: x-small;">%{Variable-Name}</span>. For example, the current user name is <span style="font-family: courier; font-size: x-small;">%{User-Name}</span>.<br /></li><li>Similar to shell variables, you can have conditional values. The basic syntax is <span style="font-family: courier; font-size: x-small;">%{%{Variable-1}:-${Variable-2}}</span>. A typical pattern that you'll see is using the "stripped" user name (the user name without any realm information), but if that's not defined, then use the actual user name: <span style="font-family: courier; font-size: x-small;">%{%{Stripped-User-Name}:-%{User-Name}}</span></li></ol><div>For your basic LDAP integration (if you provide a valid username and password, you can sign in), you'll need to set the following values in the "ldap" block:</div></div><div><ol><li><span style="font-family: courier; font-size: x-small;">server</span>; this is the hostname or address of your server. If you're running freeRADIUS on the same LDAP server, then this will be "localhost".</li><li><span style="font-family: courier; font-size: x-small;">identity</span>; this is the DN for the "bind" user. That's the user that freeRADIUS will log in as in order to search the directory tree and do its LDAP stuff. This is typically a read-only user.</li><li><span style="font-family: courier; font-size: x-small;">password</span>; this is the password for the user configured in <span style="font-family: courier; font-size: x-small;">identity</span>.</li><li><span style="font-family: courier; font-size: x-small;">base_dn</span>; this is the base DN to use for all user searches. It's usually something like <span style="font-family: courier; font-size: x-small;">dc=example,dc=com</span>, but that'll depend on your LDAP setup. You'll generally want to set this as the base for all of your users (maybe something like <span style="font-family: courier; font-size: x-small;">ou=users,dc=example,dc=com</span>, etc.).</li></ol><div><div>Here's an example that assumes that your users are all under <span style="font-family: courier; font-size: x-small;">ou=users,dc=example,dc=com</span>:</div><div><span style="font-family: courier; font-size: x-small;">server = "my-ldap-server.example.com"<br />identity = "uid=my-bind-user,ou=service-users,dc=example,dc=com"<br />password = "abc123"<br />base_dn = "ou=users,dc=example,dc=com"</span></div></div><h2 style="text-align: left;">Users</h2><div>You'll also need to set up user-level things in the "user" block:</div><ol><li><span style="font-family: courier; font-size: x-small;">filter</span>; this is the LDAP search condition that freeRADIUS will use to try to find the matching LDAP user for the user name that just tried to sign in via Radius. This is where run-time variables will come into play. For out-of-the-box OpenLDAP, something like this will generally work: <span style="font-family: courier; font-size: x-small;">(uid=%{%{Stripped-User-Name}:-%{User-Name}})</span>. What this means is look for an entity in LDAP (under the base DN defined in <span style="font-family: courier; font-size: x-small;">basedn</span>) with a <span style="font-family: courier; font-size: x-small;">uid</span> property of the Radius user name. Yes, you need the surrounding parentheses. No, I don't make the rules.</li></ol><div>Here's an example that uses "uid" for the user name.</div></div><div><span style="font-family: courier; font-size: x-small;">filter = "(uid=%{%{Stripped-User-Name}:-%{User-Name}})"</span></div><div><br /></div><div>Remember, <span style="font-family: courier; font-size: x-small;">filter</span> can be any LDAP filter, so if there were a property that you also wanted to check (such as <span style="font-family: courier; font-size: x-small;">isAllowedToDoRadius</span> or something), then you could check for that, as well. For example:</div><div><span style="font-family: courier; font-size: x-small;">filter = "(&(uid=%{%{Stripped-User-Name}:-%{User-Name}})(isAllowedToDoRadius=yes))"</span></div><h2>Filtering by group</h2><div>So, that'll let any LDAP user authenticate with Radius. Maybe you want that, maybe you don't. In my case, I have a whole bunch of users, but I only want a small subset to be able to VPN in using SoftEther. I added those users to the "vpn-users" group in LDAP.</div><div><br /></div><div>Note that there are two general grouping strategies in LDAP:</div><div><ol><li>Groups-have-users; in this strategy, the group entity lists the users within the group. This is the default OpenLDAP strategy.</li><li>Users-have-groups; in this strategy, the user entity lists the groups that it belongs to.</li></ol><div>If you want to have freeRADIUS respect your groups, you'll need to set the following in <span style="font-family: courier; font-size: x-small;">/etc/freeradius/3.0/mods-enabled/ldap</span> in the "groups" block:</div></div><div><ol><li><span style="font-family: courier; font-size: x-small;">name_attribute = cn</span> (which turns on tracking groups); and</li><li>One of these two options, which each correspond to one of the LDAP grouping strategies:</li><ol><li><span style="font-family: courier; font-size: x-small;">membership_filter</span>; this is an LDAP filter to use to query for all of the groups that the user belongs to.</li><li><span style="font-family: courier; font-size: x-small;">membership_attribute</span>; this is the property on the user entity that lists the groups that the user belongs to.</li></ol></ol><div>If your groups have users, this might look like:</div></div><div><div><span style="font-family: courier; font-size: x-small;">name_attribute = cn</span></div><div><span style="font-family: courier; font-size: x-small;">membership_filter = "(&(objectClass=posixGroup)(memberUid=%{%{Stripped-User-Name}:-%{User-Name}}))"</span></div></div><div><br /></div><div>If your users have groups, this might look like:</div><div><span style="font-family: courier; font-size: x-small;">name_attribute = cn<br />membership_attribute = groupName</span></div><div><br /></div><div>With that set up, freeRADIUS will now <i>know </i>which groups the user belongs to, but it won't do anything with them.</div><div><br /></div><div>The last step is to set up some group rules in <span style="font-family: courier; font-size: x-small;">/etc/freeradius/3.0/users</span>. There will probably be a few entries in that file already, but by default, none of them will be LDAP-related. So, at the very bottom, add the LDAP group rules.</div><div><br /></div><div><i>Note: In my case, this file was a symlink to "mods-config/files/authorize". The symlink was a convenience for backward-compatibility in editing the config files; freeRADIUS doesn't actually load "users"; rather, it loads "mods-config/files/authorize", so make sure that you're actually modifying the correct file.</i></div><div><br /></div><div>The simplest grouping rules will look like this:</div><div><div><span style="font-family: courier; font-size: x-small;">DEFAULT LDAP-Group == "your-group-name-here"</span></div><div><span style="font-family: courier; font-size: x-small;">DEFAULT Auth-Type := Reject</span></div><div><span style="font-family: courier; font-size: x-small;"> Reply-Message = "Sorry, you're not part of an authorized group."</span></div></div><div><br /></div><div>This generally means: you have to a member of "your-group-name-here" or else you'll be rejected (and here's the message to send you).</div><div><br /></div><div>In my case, my group is "vpn-users", so it looks like this:</div><div><div><span style="font-family: courier; font-size: x-small;">DEFAULT LDAP-Group == "vpn-users", Auth-Type := Accept</span></div><div><span style="font-family: courier; font-size: x-small;">DEFAULT Auth-Type := Reject</span></div><div><span style="font-family: courier; font-size: x-small;"> Reply-Message = "Sorry, you're not part of an authorized group."</span></div></div><div><br /></div><div>Once that's done, restart freeradius and you'll be good to go.</div><div><span style="font-family: courier; font-size: x-small;">sudo systemctl restart freeradius</span></div><div><br /></div><div>To test to see if it worked, you can run the radtest command:</div><div><span style="font-family: courier; font-size: x-small;">radtest -x ${username} ${password} ${address} ${port} ${secret}</span></div><div><br /></div><div>For example, in our case, this might look like:</div><div><span style="font-family: courier; font-size: x-small;">radtest -x some-user abc123 my-radius-server.example.com 1812 the-gold-is-under-the-bridge</span></div><div><br /></div><div>On success, you'll see something like:</div><div><span style="font-family: courier; font-size: x-small;">rad_recv: Access-Accept packet</span></div><div><br /></div><div>On failure, you'll see something like:</div><div><span style="font-family: courier; font-size: x-small;">rad_recv: Access-Reject packet</span></div><div><br /></div><div>Hopefully this helped a bit; I struggle every time I need to do <i>anything </i>with LDAP or Radius. It's always really hard to find the documentation for what I'm looking for.</div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-31556280085473873002020-12-22T17:58:00.009-05:002020-12-23T09:53:18.977-05:00Using LDAP groups to limit access to a Radius server<p>Anytime I need to create a VPN (to my home network, to my AWS network, etc.), I use <a href="https://www.softether.org/">SoftEther</a>. SoftEther is OpenVPN-compatible, supports L2TP/IPsec, and has some neat settings around VPN over ICMP and DNS. Anyway, once you get it set up, it generally just works (except for the cronjob that you need to make to trim its massive log files daily).</p><p>At work, we use LDAP for our user authentication and permissions, but SoftEther doesn't support LDAP. It does, however, support Radius, and <a href="https://freeradius.org/">freeRADIUS</a> supports using LDAP as a module, so you can easily set up a quick Radius proxy for LDAP.</p><h2 style="text-align: left;">Quick recap on setting up freeRADIUS with LDAP</h2><div>I'm assuming that you already have an LDAP server.</div><div><br /></div><div>Install freeRADIUS and the LDAP module.</div><div><span style="font-family: courier; font-size: x-small;">sudo apt install freeradius freeradius-ldap</span></div><div><span style="font-family: courier; font-size: x-small;">sudo systemctl enable freeradius</span></div><div><span style="font-family: courier; font-size: x-small;">sudo systemctl start freeradius</span></div><div><br /></div><div>Then turn on the LDAP module by editing <span style="font-family: courier; font-size: x-small;">/etc/freeradius/sites-enabled/default</span> and uncommenting the "ldap" line under the "authorize" block.</div><div><span style="font-family: courier; font-size: x-small;">authorize {</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><span style="font-family: courier; font-size: x-small;"> ldap</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><br /></div><div>And the same for the "Auth-Type LDAP" block.</div><div><span style="font-family: courier; font-size: x-small;">authorize {</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><span style="font-family: courier; font-size: x-small;"> Auth-Type LDAP {</span></div><div><span style="font-family: courier; font-size: x-small;"> ldap</span></div><div><span style="font-family: courier; font-size: x-small;"> }</span></div><div><span style="font-family: courier; font-size: x-small;">...</span></div><div><br /></div><div>Cool; at this point, freeRADIUS will use whatever LDAP setup is in the <span style="font-family: courier; font-size: x-small;">/etc/freeradius/modules/ldap</span> file. It won't work (because it's not set up for your LDAP server), that's all that you need in order to back your Radius server with your LDAP server.</div><div><br /></div><div>Next up, we'll look at configuring it to actually talk to your LDAP server.</div><h2 style="text-align: left;">Configuring the LDAP module</h2><div><span style="font-family: courier; font-size: x-small;">/etc/freeradius/modules/ldap</span> is where the LDAP configuration lives. In order to understand exactly what's going on, you should know a few things.</div><div><ol style="text-align: left;"><li><a href="https://wiki.freeradius.org/config/run_time_variables">Run-time variables</a>, like the current user name, are written as <span style="font-family: courier; font-size: x-small;">%{Variable-Name}</span>. For example, the current user name is <span style="font-family: courier; font-size: x-small;">%{User-Name}</span>.<br /></li><li>Similar to shell variables, you can have conditional values. The basic syntax is <span style="font-family: courier; font-size: x-small;">%{%{Variable-1}:-${Variable-2}}</span>. A typical pattern that you'll see is using the "stripped" user name (the user name without any realm information), but if that's not defined, then use the actual user name: <span style="font-family: courier; font-size: x-small;">%{%{Stripped-User-Name}:-%{User-Name}}</span></li></ol><div>For your basic LDAP integration (if you provide a valid username and password, you can sign in), you'll need to set the following values in the "ldap" block:</div></div><div><ol style="text-align: left;"><li><span style="font-family: courier; font-size: x-small;">server</span>; this is the hostname or address of your server. If you're running freeRADIUS on the same LDAP server, then this will be "localhost".</li><li><span style="font-family: courier; font-size: x-small;">identity</span>; this is the DN for the "bind" user. That's the user that freeRADIUS will log in as in order to search the directory tree and do its LDAP stuff. This is typically a read-only user.</li><li><span style="font-family: courier; font-size: x-small;">password</span>; this is the password for the user configured in <span style="font-family: courier; font-size: x-small;">identity</span>.</li><li><span style="font-family: courier; font-size: x-small;">basedn</span>; this is the base DN to use for all user searches. It's usually something like <span style="font-family: courier; font-size: x-small;">dc=example,dc=com</span>, but that'll depend on your LDAP setup. You'll generally want to set this as the base for all of your users (maybe something like <span style="font-family: courier; font-size: x-small;">ou=users,dc=example,dc=com</span>, etc.).</li><li><span style="font-family: courier; font-size: x-small;">filter</span>; this is the LDAP search condition that freeRADIUS will use to try to find the matching LDAP user for the user name that just tried to sign in via Radius. This is where run-time variables will come into play. For out-of-the-box OpenLDAP, something like this will generally work: <span style="font-family: courier; font-size: x-small;">(uid=%{%{Stripped-User-Name}:-%{User-Name}})</span>. What this means is look for an entity in LDAP (under the base DN defined in <span style="font-family: courier; font-size: x-small;">basedn</span>) with a <span style="font-family: courier; font-size: x-small;">uid</span> property of the Radius user name. Yes, you need the surrounding parentheses. No, I don't make the rules.</li></ol><div>Here's an example that assumes that your users are all under <span style="font-family: courier; font-size: x-small;">ou=users,dc=example,dc=com</span> and have a <span style="font-family: courier; font-size: x-small;">uid</span> property that is their user name:</div></div><div><span style="font-family: courier; font-size: x-small;">server = "my-ldap-server.example.com"<br />identity = "uid=my-bind-user,ou=service-users,dc=example,dc=com"<br />password = "abc123"<br />basedn = "ou=users,dc=example,dc=com"<br />filter = "(uid=%{%{Stripped-User-Name}:-%{User-Name}})"</span></div><div><br /></div><div>Remember, <span style="font-family: courier; font-size: x-small;">filter</span> can be any LDAP filter, so if there were a property that you also wanted to check (such as <span style="font-family: courier; font-size: x-small;">isAllowedToDoRadius</span> or something), then you could check for that, as well. For example:</div><div><span style="font-family: courier; font-size: x-small;">filter = "(&(uid=%{%{Stripped-User-Name}:-%{User-Name}})(isAllowedToDoRadius=yes))"</span></div><h2 style="text-align: left;">Filtering by group</h2><div>So, that'll let any LDAP user authenticate with Radius. Maybe you want that, maybe you don't. In my case, I have a whole bunch of users, but I only want a small subset to be able to VPN in using SoftEther. I added those users to the "vpn-users" group in LDAP.</div><div><br /></div><div>Note that there are two general grouping strategies in LDAP:</div><div><ol style="text-align: left;"><li>Groups-have-users; in this strategy, the group entity lists the users within the group. This is the default OpenLDAP strategy.</li><li>Users-have-groups; in this strategy, the user entity lists the groups that it belongs to.</li></ol><div>If you want to have freeRADIUS respect your groups, you'll need to set the following in <span style="font-family: courier; font-size: x-small;">/etc/freeradius/modules/ldap</span>:</div></div><div><ol style="text-align: left;"><li><span style="font-family: courier; font-size: x-small;">groupname_attribute = cn</span> (which turns on tracking groups); and</li><li>One of these two options, which each correspond to one of the LDAP grouping strategies:</li><ol><li><span style="font-family: courier; font-size: x-small;">groupmembership_filter</span>; this is an LDAP filter to use to query for all of the groups that the user belongs to.</li><li><span style="font-family: courier; font-size: x-small;">groupmembership_attribute</span>; this is the property on the user entity that lists the groups that the user belongs to.</li></ol></ol><div>If your groups have users, this might look like:</div></div><div><div><span style="font-family: courier; font-size: x-small;">groupname_attribute = cn</span></div><div><span style="font-family: courier; font-size: x-small;">groupmembership_filter = "(&(objectClass=posixGroup)(memberUid=%{%{Stripped-User-Name}:-%{User-Name}}))"</span></div></div><div><br /></div><div>If your users have groups, this might look like:</div><div><div><span style="font-family: courier; font-size: x-small;">groupname_attribute = cn<br />groupmembership_attribute = groupName</span></div></div><div><br /></div><div>With that set up, freeRADIUS will now <i>know </i>which groups the user belongs to, but it won't do anything with them.</div><div><br /></div><div>The last step is to set up some group rules in <span style="font-family: courier; font-size: x-small;">/etc/freeradius/users</span>. There will probably be a few entries in that file already, but by default, none of them will be LDAP-related. So, at the very bottom, add the LDAP group rules.</div><div><br /></div><div>The simplest grouping rules will look like this:</div><div><div><span style="font-family: courier; font-size: x-small;">DEFAULT LDAP-Group == "your-group-name-here"</span></div><div><span style="font-family: courier; font-size: x-small;">DEFAULT Auth-Type := Reject</span></div><div><span style="font-family: courier; font-size: x-small;"> Reply-Message = "Sorry, you're not part of an authorized group."</span></div></div><div><br /></div><div>This generally means: you have to a member of "your-group-name-here" or else you'll be rejected (and here's the message to send you).</div><div><br /></div><div>In my case, my group is "vpn-users", so it looks like this:</div><div><div><span style="font-family: courier; font-size: x-small;">DEFAULT LDAP-Group == "vpn-users"</span></div><div><span style="font-family: courier; font-size: x-small;">DEFAULT Auth-Type := Reject</span></div><div><span style="font-family: courier; font-size: x-small;"> Reply-Message = "Sorry, you're not part of an authorized group."</span></div></div><div><br /></div><div>Once that's done, restart freeradius and you'll be good to go.</div><div><span style="font-family: courier; font-size: x-small;">sudo systemctl restart freeradius</span></div><div><br /></div><div>To test to see if it worked, you can run the radtest command:</div><div><span style="font-family: courier; font-size: x-small;">radtest -x ${username} ${password} ${address} ${port} ${secret}</span></div><div><br /></div><div>For example, in our case, this might look like:</div><div><span style="font-family: courier; font-size: x-small;">radtest -x some-user abc123 my-radius-server.example.com 1812 the-gold-is-under-the-bridge</span></div><div><br /></div><div>On success, you'll see something like:</div><div><span style="font-family: courier; font-size: x-small;">rad_recv: Access-Accept packet</span></div><div><br /></div><div>On failure, you'll see something like:</div><div><span style="font-family: courier; font-size: x-small;">rad_recv: Access-Reject packet</span></div><div><br /></div><div>Hopefully this helped a bit; I struggle every time I need to do <i>anything </i>with LDAP or Radius. It's always really hard to find the documentation for what I'm looking for.</div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-42192219808730939262020-12-07T18:07:00.004-05:002020-12-07T18:08:07.348-05:00Working with the Google Datastore emulator<p> I do a good chunk of my business in Google App Engine; you package up your web application, send it to GCP, and then it takes care of scaling and uptime and all that stuff.</p><p>When I started out in 2014, I created my main application in Java because that was the least-crappy language that was supported. However, in 2020, there are a whole lot more languages (in particular: Go). I've slowly been working on porting my application from Java 8 to Go 1.14. Along the way, I've run into some really annoying issues.</p><p>For today, I'm going to be focusing on the Datastore emulator. In "old" App Engine (Java 8, Go 1.11, Python 2, etc.), they gave you a whole emulator suite. Your application ran inside of that suite, and you had fake Google-based App Engine authentication, inbound e-mail, and a Datastore emulator that also had a web UI that you could use to see your entities and manipulate them. The Datastore emulator's web UI wasn't as good as the current one that you get in production, but it was good enough to use for development.</p><p>Well, in "new" App Engine, the emulator suite is gone, and now you have to emulate or mock every aspect of App Engine that you plan on using. It's not a huge deal, but it is a bit inconvenient. In particular, you now have to <a href="https://cloud.google.com/datastore/docs/tools/datastore-emulator">start your own Datastore emulator</a>.</p><p>It's easy to start:</p><p><span style="font-family: courier; font-size: x-small;">gcloud config set project <your-project-id>;<br />gcloud beta emulators datastore start;</span></p><p>There are some environment variables that you'll need to export for the various libraries to detect and use instead of the production instance; run this to see them:</p><p><span style="font-family: courier; font-size: x-small;">gcloud beta emulators datastore env-init;</span></p><p>That part is fine.</p><p>There are also two halfway-decent third party web UIs for the Datastore emulator:</p><p></p><ol style="text-align: left;"><li><a href="https://github.com/GabiAxel/google-cloud-gui">https://github.com/GabiAxel/google-cloud-gui</a></li><li><a href="https://github.com/streamrail/dsui">https://github.com/streamrail/dsui</a></li></ol><p></p><p>I fought for <i>hours </i>trying to figure out why either of those two web UIs didn't work. Neither would show any namespaces (and thus, neither would show any entities).</p><p>The short answer is that despite what the Datastore emulator <i>claims </i>it's using for the project ID, the only thing that it actually uses is "dummy-emulator-datastore-project".</p><p>I got a hint about it by poking around in the emulator's data file, and I got some confirmation in this file, which is the only thing on the Internet at the time of this writing that references that string: <a href="https://code.googlesource.com/gocloud/+/master/datastore/datastore.go">https://code.googlesource.com/gocloud/+/master/datastore/datastore.go</a></p><p>So, if you start the Datastore emulator according to the instructions and either of those two web UIs aren't working, try setting the project ID to "dummy-emulator-datastore-project".</p><p></p><ol style="text-align: left;"><li>In "google-cloud-gui", you set the project ID in the UI when you hit the "+" button to create a new project.</li><li>In "dsui", you set the project ID using the "--projectId" flag.</li></ol><div><br /></div><p></p>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-58527446986196286262020-07-18T01:27:00.002-04:002020-07-18T01:28:25.564-04:00Google Cloud Functions Issues Upgrading From Go 1.11 To 1.13<div>I use Google Cloud Functions with Go. However, I upgraded from Go 1.11 to Go 1.13 (because Go 1.11 is being deprecated) and ran into some annoying, undocumented issues.</div><h2 style="text-align: left;">Static Files And The Current Working Directory</h2><div>One of my Cloud Functions acts as a tiny web server; it has a few static HTML files that it serves in addition to its dynamic things.</div><div><br /></div><div>In Go 1.11, Cloud Functions put the static files (and all the source files, for that matter) in the working directory of the function. This (1) makes sense, and (2) makes testing easy.</div><div><br /></div><div>However, in Go 1.13, Cloud Functions puts the static files (and all of the source files) is placed in the <font face="courier" size="2">./serverless_function_source_code</font> directory. Why? Who knows. All that mattered is that after a simple version upgrade, all of my stuff broke because it couldn't find files that it was able to find before the upgrade.</div><div><br /></div><div>I found that using a <font face="courier" size="2">sync.Once</font> to attempt to change the current working directory (if necessary) is a fairly clean backward-compatible way of handling this issue.</div><div><br /></div><div>Here's an example; it's fairly verbose, but you could rip out most of the logging if you don't want or need it.</div><div><br /></div><div><div><font face="courier" size="2">// GoogleCloudFunctionSourceDirectory is where Google Cloud will put the source code that was uploaded.</font></div><div><font face="courier" size="2">//</font></div><div><font face="courier" size="2">const GoogleCloudFunctionSourceDirectory = "serverless_function_source_code"</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2">// once is an object that will only execute its function one time.</font></div><div><font face="courier" size="2">//</font></div><div><font face="courier" size="2">// Because we want to log during our initialization, we need to handle this in a non-standard</font></div><div><font face="courier" size="2">// function and keep track of our initialization status.</font></div><div><font face="courier" size="2">var once sync.Once</font></div></div><div><font face="courier" size="2"><br /></font></div><div><div><font face="courier" size="2">// Initialize initializes the application.</font></div><div><font face="courier" size="2">//</font></div><div><font face="courier" size="2">// Primarily, this changes the current working directory.</font></div><div><font face="courier" size="2">func Initialize(log *logrus.Logger) {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("Initializing the application.")</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>path, err := os.Getwd()</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>if err != nil {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Warnf("Could not find the current working directory: %v", err)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>}</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("Current working directory: %s", path)</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("Looking for top-level source directory: %s", GoogleCloudFunctionSourceDirectory)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>fileInfo, err := os.Stat(GoogleCloudFunctionSourceDirectory)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>if err == nil && fileInfo.IsDir() {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("Found top-level source directory: %s", GoogleCloudFunctionSourceDirectory)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>err = os.Chdir(GoogleCloudFunctionSourceDirectory)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>if err != nil {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Warnf("Could not change to directory %q: %v", GoogleCloudFunctionSourceDirectory, err)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>}</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>}</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("Initialization complete.")</font></div><div><font face="courier" size="2">}</font></div></div><div><font face="courier" size="2"><br /></font></div><div><div><font face="courier" size="2">// CloudFunction is an HTTP Cloud Function with a request parameter.</font></div><div><font face="courier" size="2">func CloudFunction(w http.ResponseWriter, r *http.Request) {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log := logrus.New()</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>// Initialize our application if we haven't already.</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>once.Do(func() { Initialize(log) })</font></div></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>// YOUR CLOUD FUNCTION LOGIC HERE</font></div><div><font face="courier" size="2">}</font></div><div><br /></div><div>For more information, see <a href="https://cloud.google.com/functions/docs/concepts/exec#file_system">the Cloud Functions concepts docs</a>.</div><h2 style="text-align: left;">Logging And Environment Variables</h2><div>For whatever reason, Cloud Functions with Go don't log at anything other than the "default" log level; this means that all of my carefully crafted log messages all just get dumped into the logs at the same severity.</div><div><br /></div><div>I've been using <a href="https://github.com/tekkamanendless/gcfhook">gcfhook</a> with <a href="https://github.com/sirupsen/logrus">logrus</a> to get around this, but it's not an ideal solution. That combination works by nullifying all output of the application and then adding a logrus hook that connects to the StackDriver API to send proper logs over the network. It works fine, but it's silly to have to make a network connection to a logging API when the application itself can output directly.</div><div><br /></div><div>As of Go 1.13, Cloud Functions will no longer set the <font face="courier" size="2">FUNCTION_NAME</font>, <font face="courier" size="2">FUNCTION_REGION</font>, and <font face="courier" size="2">GCP_PROJECT</font> environment variables. This is a problem because we need those three pieces of information in order to use the StackDriver API to send the log messages. You <i>could </i>publish those environment variables back as part of your deployment, but I'd prefer not to.</div><div><br /></div><div>Fortunately, Cloud Functions can now parse (poorly documented) JSON-formatted lines from <font face="courier" size="2">stdout</font> and <font face="courier" size="2">stderr</font>, resulting in proper log messages with severities. The Cloud Functions docs refer to this as <a href="https://cloud.google.com/logging/docs/structured-logging">"structured logging"</a>, but the docs don't seem to apply correctly. Cloud Run has <a href="https://cloud.google.com/run/docs/logging#special-fields">a document</a> on how these JSON-formatted lines should look, but it's still a bit hazy.</div><div><br /></div><div>Anyway, the <a href="https://github.com/tekkamanendless/gcfstructuredlogformatter">gcfstructuredlogformatter</a> package introduces a logrus <i>formatter</i> that outputs JSON instead of plain text for logs. This eliminates the need for the extra environment variables and generally simplifies the logging workflow. It should only be a couple of lines of code to sub out gcfhook for gcfstructuredlogformatter.</div><div><br /></div><div>Here's an example:</div><div><br /></div><div><div><font face="courier" size="2">// CloudFunction is an HTTP Cloud Function with a request parameter.</font></div><div><font face="courier" size="2">func CloudFunction(w http.ResponseWriter, r *http.Request) {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log := logrus.New()</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>if value := os.Getenv("FUNCTION_TARGET"); value == "" {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("FUNCTION_TARGET is not set; falling back to normal logging.")</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>} else {</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>formatter := gcfstructuredlogformatter.New()</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.SetFormatter(formatter)</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>}</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Infof("This is an info message.")</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Warnf("This is a warning message.")</font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>log.Errorf("This is an error message.")</font></div><div><font face="courier" size="2"><br /></font></div><div><font face="courier" size="2"><span style="white-space: pre;"> </span>// YOUR CLOUD FUNCTION LOGIC HERE</font></div><div><font face="courier" size="2">}</font></div></div><div><br /></div><div>Hopefully this stopped you from banging your head against the wall for a few hours like I was doing as I tried to frantically figure out why the upgrade had failed in such weird ways.</div><div><br /></div><div><br /></div>Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-38649490747973389542020-06-03T22:39:00.003-04:002020-07-04T18:56:23.910-04:00Sharing a single screen in Slack for LinuxI have a bunch of monitors, and for whatever reason, Slack for Linux refuses to let me limit my screen sharing to a single monitor or application. This means that if I try to share my screen on a call, no one can see or read anything because they just see a giant, wide view of three monitors' worth of pixels crammed into their Slack window (typically only one monitor wide).<div><br /></div><div>In my experience, disabling monitors/displays is just not worth it; I'll have to spend too much time getting everything set back up correctly afterward, and that's really inconvenient and annoying.</div><div><br /></div><div>The solution that I've landed on is <a href="https://freedesktop.org/wiki/Software/Xephyr/">Xephyr</a>; Xephyr runs a second X11 server inside a new window, so when I need to get on a call where I'll have to share my screen, I simply:</div><div><ol style="text-align: left;"><li>Launch a new Xephyr display.</li><li>Close Slack.</li><li>Open Slack on the Xephyr display.</li><li>Open whatever else I'll need to share in the Xephyr display, typically a web browser or a terminal.</li><li>Get on the Slack call and share my "screen".</li></ol><div>Some small details:</div></div><div><ul style="text-align: left;"><li>You'll need to open Xephyr with the resolution that you want; given window decorations and such, you may need to play around with this a bit. Once you find out what works, put it in a script.</li><li>In order to resize windows in Xephyr, it'll need to be running a window manager. I struggled to get any "startx"-related things working, but I found that "twm" worked well enough for my purposes.</li><li>Some applications, such as Chrome, won't open on two displays at the same time. I just open a different browser in my Xephyr display (for example, I use "google-chrome" normally and "chromium-browser" in Xephyr), but you can also run Chrome using a different profile directory and it'll run in the other display.</li></ul><div>Install Xephyr and TWM:</div></div><div><pre style="text-align: left;">sudo apt install xserver-xephyr twm</pre></div><div><br /></div><div>Run Zephyr, Slack, and Chromium:</div><pre style="text-align: left;"># Launch Xephyr and create display ":1".<br />Xephyr -ac -noreset -screen 1920x1000 :1 &<br /># Start a window manager in Xephyr.<br />DISPLAY=:1 twm &>/dev/null &<br /># Open Slack in Xephyr.<br />DISPLAY=:1 slack &>/dev/null &<br /># Open Chromium in Xephyr.<br />DISPLAY=:1 chromium-browser &>/dev/null &</pre><div><br /></div><div>It's kind of dirty, but it works extremely well, and I don't have to worry about messing with my monitor setup when I need to give a presentation.</div><div><br /></div><div><i>Edit: an earlier version of this post used "Xephyr -bc -ac -noreset -screen 1920x1000 :1 &" for the Xephyr command; I can't get this to work with "-bc" anymore; I must have copied the wrong command when I published the post.</i></div>Anonymousnoreply@blogger.com3tag:blogger.com,1999:blog-1498439248860252027.post-44627577771288723682020-01-29T22:47:00.000-05:002020-01-29T22:48:15.913-05:00Unit-testing reCAPTCHA v2 and v3 in GoI recently worked on a project where we allowed new users to sign up for our system with a form. A new user would need to provide us with her name, her e-mail address, and a password. In order to prevent spamming, we used <a href="https://developers.google.com/recaptcha/docs/v3">reCAPTCHA v3</a>, and so that meant that we also submitted a reCAPTCHA token along with the rest of the new-user data.<br />
<br />
Unit-testing the sign-up process was fairly simple if we turned off the reCAPTCHA requirement, but the weakest link in the whole process is the one part that we could not control: reCAPTCHA. It would be foolish not to have test coverage around the reCAPTCHA workflow.<br />
<br />
So, how do you unit-test reCAPTCHA?<br />
<h2>
Focus: Server-side testing</h2>
<div>
For the purposes of this post, I'm going to be focusing on testing reCAPTCHA on the server side. This means that I'm not concerned with validating that users acted like humans fiddling around on a website. Instead, I'm concerned with what our sign-up endpoint does when it receives valid and invalid reCAPTCHA tokens.</div>
<h2>
reCAPTCHA in Go</h2>
<div>
There are a variety of Go package to provide reCAPTCHA support; however, only one of them (1) has support for Go modules, and (2) has support for unit testing built in:</div>
<div>
<a href="https://github.com/tekkamanendless/go-recaptcha">https://github.com/tekkamanendless/go-recaptcha</a></div>
<div>
<br /></div>
<div>
For docs, see:</div>
<div>
<a href="https://godoc.org/github.com/tekkamanendless/go-recaptcha">https://godoc.org/github.com/tekkamanendless/go-recaptcha</a></div>
<div>
<br /></div>
<div>
When you create a new reCAPTCHA site, you're given public and private keys (short little strings, nothing huge). The public key is used on the client side when you make your connection to the reCAPTCHA API, and a response token is provided back. The private key is used on the server side to connect to the reCAPTCHA API and validate the response token.</div>
<div>
<br /></div>
<div>
Since the client side will likely be a line or two of Javascript that generates a token, our server-side work will be focused on validating that token.</div>
<div>
<br /></div>
<div>
Assuming that the newly generated token is "NEW_TOKEN" and that the private key is "YOUR_PRIVATE_KEY", then this is all you have to do in order to validate that token:</div>
<br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">import "github.com/tekkamanendless/go-recaptcha" </span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// ...</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">recaptchaVerifier := recaptcha.New("YOUR_PRIVATE_KEY")<br />success, err := recaptchaVerifier.Verify("NEW_TOKEN")<br />if err != nil {<br /> // Fail with some 500-level error about not being able to verify the token<br />}</span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br />if !success {<br /> // Fail with some 400-level error about not being a human<br />}</span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br />// The token is valid!</span><br />
<div>
<br />
<div>
And that's it! All we really care about is whether or not it worked.</div>
<h2>
Unit testing</h2>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">tekkamanendless/go-recaptcha</span> includes a package called "recaptchatest" that provides a fake reCAPTCHA API running as an <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">httptest.Server</span> instance. This server simulates enough of the reCAPTCHA API to let you do the kinds of testing that you need to.</div>
</div>
</div>
</div>
</div>
<div>
<br /></div>
<div>
Just like the actual reCAPTCHA service, you can create multiple "sites" on the test server. Each site will have a public and private key, and you can call the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">NewResponseToken</span> method of a site to have that site generate a valid token for that site.</div>
<div>
<br /></div>
<div>
In terms of design, you'll set up the test server, the test site, and the valid token in advance of your test. When you create your <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">Recaptcha</span> instance with the test site's private key, all you have to do is set the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">VerifyEndpoint</span> property of that instance to point to the test server (otherwise, it would try to talk to the real reCAPTCHA API and fail).</div>
<div>
<br /></div>
<div>
Here's a simple example:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">import (</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> "github.com/tekkamanendless/go-recaptcha" </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> "github.com/tekkamanendless/go-recaptcha/recaptchatest"</span></div>
</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// ...</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// Create a new reCAPTCHA test server, site, and valid token before the main test.</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">testServer := recaptchatest.NewServer()</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">defer testServer.Close()</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">site := testServer.NewSite()</span></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">token := site.NewResponseToken()</span></div>
</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// Create the reCAPTCHA verifier with the site's private key.</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">recaptchaVerifier := recaptcha.New(site.PrivateKey)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// Override the endpoint so that it uses the test server.</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">recaptchaVerifier.VerifyEndpoint = testServer.VerifyEndpoint()</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// Run your test.</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// ...</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">// Validate that the reCAPTCHA token is good.</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">success, err := recaptchaVerifier.Verify(token)</span></div>
</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">assert.Nil(t, err)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">assert.True(t, success)</span></div>
<div>
<br /></div>
<div>
The <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">recaptchatest</span> test server doesn't do too much that's fancy, but it will properly return a failure if the same token is verified twice or if the token is too old. It also has some functions to let you tweak the token properties so you don't have to wait around for 2 minutes for a token to age out; you can make one that's already too old (see <a href="https://godoc.org/github.com/tekkamanendless/go-recaptcha/recaptchatest#Site.GenerateToken">Site.GenerateToken</a> for more information).</div>
Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-47605914629773419852019-10-14T18:17:00.000-04:002019-10-14T18:17:18.481-04:00Encrypt your /home directory using LUKS and a spare diskEvery year or two, I rotate a drive out of my NAS. My most recent rotation yielded me with a spare 1TB SSD. My main machine only had a 250GB SSD, so I figured that I'd just replace my /home directory with a mountpoint on that new disk, giving my lots of space for video editing and such, since I no longer had the room to deal with my GoPro footage.<br />
<br />
My general thought process was as follows:<br />
<br />
<ol>
<li>I don't want to mess too much with my system.</li>
<li>I don't want to clone my whole system onto the new drive.</li>
<li>I want to encrypt my personal data.</li>
<li>I don't really care about encrypting the entire OS.</li>
</ol>
<div>
I had originally looked into some other encryption options, such as encrypting each user's home directory separately, but even in the year 2019 there seemed to be too much drama dealing with that (anytime that I need to make a PAM change, it's a bad day). Using LUKS, the disk (well, partition) is encrypted, so everything kind of comes for free after that.</div>
<div>
<br /></div>
<div>
If you register the partition in /etc/crypttab, your machine will prompt you for the decryption key when it boots (at least Kubuntu 18.04 does).</div>
<div>
<br /></div>
<div>
One other thing: dealing with encrypted data may be slow if your processor doesn't support AES encryption. Do a quick check and make sure that "aes" is listed under "Flags":</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lscpu;</span></blockquote>
If "aes" is there, then you're good to go. If not, then maybe run some tests to see how much CPU overhead disk operations use on LUKS (you can follow this guide, but stop before "Home setup, phase 2", and see if your overhead is acceptable).<br />
<h2>
The plan</h2>
<div>
<ol>
<li>Luks Setup</li>
<ol>
<li>Format the new disk with a single partition.</li>
<li>Set up LUKS on that partition.</li>
<li>Back up the LUKS header data.</li>
</ol>
<li>Home setup, phase 1</li>
<ol>
<li>Copy everything in /home to the new partition.</li>
<li>Update /etc/crypttab.</li>
<li>Update /etc/fstab using a test directory.</li>
<li>Reboot.</li>
<li>Test.</li>
</ol>
<li>Home setup, phase 2</li>
<ol>
<li>Update /etc/fstab using the /home directory.</li>
<li>Reboot.</li>
<li>Test.</li>
</ol>
</ol>
<h3>
LUKS setup</h3>
<div>
Wipe the new disk and make a single partition. For the remainder of this post, I'll be assuming that the partition is /dev/sdx1.</div>
</div>
<div>
<br /></div>
<div>
Install "cryptsetup".</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo apt install cryptsetup; </span></blockquote>
<div>
Set up LUKS on the partition. You'll need to give it a passphrase. I recommend something that's easy to type, like a series of four random words, but you do you). You'll have to type this passphrase every time that you boot your machine up.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo cryptsetup --verify-passphrase luksFormat /dev/sdx1;</span></blockquote>
<div>
Once that's done, you can give it some more (up to 8) passphrases. This may be helpful if you want to have other people access the disk, or if you just want to have some backups, just in case. If there are multiple passphrases, any one of them will work fine; you don't need to have multiple on hand.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo cryptsetup --verify-passphrase luksAddKey /dev/sdx1;</span></blockquote>
<div>
The next step is to "open" the partition. The last argument ("encrypted-home") is the name to use for the partition that will appear under "/dev/mapper".</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo cryptsetup luksOpen /dev/sdx1 encrypted-home;</span></blockquote>
<div>
At this point, everything is set up and ready ready. Confirm that with the "status" command.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo cryptsetup status encrypted-home;</span></blockquote>
<div>
Back up the LUKS header data. If this information gets corrupted on the disk, then there is no way to recover your data. Note that if you recover data using the header backup, then the passphrases will be the ones in the header backup, not whatever was on the disk at the time of the recovery.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo cryptsetup luksHeaderBackup /dev/sdx1 --header-backup-file /root/luks.encrypted-home.header;</span></blockquote>
<div>
I put mine in the /root folder (which will not be on the encrypted home partition), and I also backed it up to Google Drive. Remember, if you add, change, or delete passphrases, you'll want to do make another backup (otherwise, those changes won't be present during a restoration operation).</div>
<div>
<br /></div>
<div>
If you're really hardcore, fill up the partition with random data so that no part of it looks special. Remember, the whole point of encryption is to make it so that whatever you wrote just ends up looking random, so writing a bunch of zeros with "dd" will do the trick:</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo dd if=/dev/zero of=/dev/mapper/encrypted-home;</span></blockquote>
<div>
Before you can do anything with it, you'll need to format the partition. I used EXT4 because everything else on this machine is EXT4.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo mkfs.ext4 /dev/mapper/encrypted-home;</span></blockquote>
<h3>
Home setup, phase 1</h3>
<div>
Once the LUKS partition is all set up, the next set of steps is just a careful copy operation, tweaking a couple /etc files, and verifying that everything worked.</div>
<div>
<br /></div>
<div>
The safest thing to do would be to switch to a live CD here so that you're guaranteed to not be messing with your /home directory, but I just logged out of my window manager and did the next set of steps in the ctrl+alt+f2 terminal. Again, you do you.</div>
<div>
<br /></div>
<div>
Mount the encrypted home directory somewhere where we can access it.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo mkdir /mnt/encrypted-home;</span> </blockquote>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></span><span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo mount /dev/mapper-encrypted-home /mnt/encrypted-home;</span></blockquote>
<div>
Copy over everything in /home. This could take a while.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo cp -a /home/. /mnt/encrypted-home/;</span></blockquote>
<div>
Make sure that /mnt/encrypted-home contains the home folders of your users.</div>
<div>
<br /></div>
<div>
Set up /etc/crypttab. The format is:</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">${/dev/mapper name} UUID="${disk uuid}" none luks</span></blockquote>
<div>
In our case, the /dev/mapper name is going to be "encrypted-home". To find the UUID, run:</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo blkid /dev/sdx1;</span></blockquote>
<div>
So, in my particular case, /etc/crypttab looks like:</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">encrypted-home UUID="5e01cb97-ceed-40da-aec4-5f75b025ed4a" none luks</span></blockquote>
<div>
Finally, tell /etc/fstab to mount the partition to our /mnt/encrypted-home directory. We don't want to clobber /home until we know that everything works.</div>
<div>
<br /></div>
<div>
Update /etc/fstab and add:</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">/dev/mapper/encrypted-home /mnt/encrypted-home ext4 defaults 0 0</span></blockquote>
<div>
Reboot your machine.</div>
<div>
<br /></div>
<div>
When it comes back up, it should ask you for the passphrase for the encrypted-home partition. Give it one of the passphrases that you set up.</div>
<div>
<br /></div>
<div>
Log in and check /mnt/encrypted-home. As long as everything's in there that's supposed to be in there (that is, all of your /home data), then phase 1 is complete.</div>
<h3>
Home setup, phase 2</h3>
<div>
Now that we know everything works, the next step is to clean up your actual /home directory and then tell /etc/fstab to mount /dev/mapper/encrypted-home at /home.</div>
<div>
<br /></div>
<div>
I didn't want to completely purge my /home directory; instead, I deleted everything large and/or personal in there (leaving my bash profile, some app settings, etc.). This way, if my new disk failed or if I wanted to use my computer without it for some reason, then I'd at least have normal, functioning user accounts. Again, you do you. I've screwed up enough stuff in my time to like to have somewhat nice failback scenarios ready to go.</div>
<div>
<br /></div>
<div>
Update /etc/fstab and change /dev/mapper/encrypted-home line to mount to /home.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">/dev/mapper/encrypted-home /home ext4 defaults 0 0</span></blockquote>
<div>
Reboot.</div>
<div>
<br /></div>
<div>
<div>
When it comes back up, it should ask you for the passphrase for the encrypted-home partition. Give it one of the passphrases that you set up.</div>
<div>
<br /></div>
</div>
<div>
Log in. You should now be using an encrypted home directory. Yay.</div>
<div>
<br /></div>
<div>
To confirm, check your mountpoints:</div>
<div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">mount | grep /home</span></blockquote>
<div>
You should see something like:</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">/dev/mapper/encrypted-home on /home type ext4 (rw,relatime,data=ordered)</span></blockquote>
</div>
<div>
Now that everything's working, you can get rid of "/mnt/encrypted-home"; we're not using it anymore.</div>
<blockquote class="tr_bq">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sudo rmdir /mnt/encrypted-home;</span></blockquote>
<div>
<br /></div>
Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1498439248860252027.post-84823268683763651392018-03-23T10:37:00.002-04:002018-03-23T10:38:17.354-04:00Fix for when Chrome stops making screen updatesMy desktop environment is KDE, and I use Chrome for my browser. At any given time, I'll have 2-5 windows with 10-40 tabs each. However, every once in a while (usually once every week or so), the rendering of Chrome will freeze. That is, the entire window will remain frozen (visually), but clicks and everything else go through fine (you just can't see the results). Changing my window focus (switching to a different window, opening the "K" menu, etc.) usually causes a single render, but that doesn't help with actually interacting with a (visually) frozen Chrome window.<br />
<br />
Closing Chrome and opening it back up works, but that's really inconvenient.<br />
<br />
I'm still not sure why this happens, but I do have a quick (and convenient) fix: change your compositor (and then change it back). Why does this work? I'm not sure, but since it's obviously a rendering problem, making a rendering change makes sense.<br />
<br />
Step by step:<br />
<br />
<ol>
<li>Open "System Settings".</li>
<li>Open "Display and Monitor".</li>
<li>Go to "Compositor".</li>
<li>Change "Rendering backend" from whatever it is to something else (usually "OpenGL 3.1" to "OpenGL 2.0" or <i>vice versa</i>).</li>
<li>Click "Apply".</li>
</ol>
<div>
This always solves the problem for me. You can even switch it back to the original value after you hit "Apply" the first time.</div>
<div>
<br /></div>
Anonymousnoreply@blogger.com0