My Weekly Retrospectives

Week of 2022-11-14

This Week, In Brief

Draft the first version of Ascent Checklist CI/CD within Kubernetes clusters!

Highlights

Goals

Goal Grades

Create source code repository validator

Create Ascent Checklist schema and validation

Create Ascent Checklist provisioner with a simple test deployment scenario

More Stuff

ClusterAPI works great…until it doesn’t

I spent my entire week fighting ClusterAPI, which was really frustrating.

Long story short…is still pretty long. Posted below.

I really hate running into dealbreaking walls like this.

ClusterAPI works great…until it doesn’t, the novella

i hope this entry makes sense, because this journey was extremely confusing.

end-to-end tests for the Kubernetes Developer Platform Provisioner used Kubernetes v1.22.1 for clusters created within Management Planes and Platforms. This means three things:

cgroup confusion

The kubelet service would not start on the Platform clusters due to this error: "failed to run Kubelet: invalid configuration: cgroup-root [\"kubelet\"] doesn't exist" . Based on this thread, I think this was occurring due to a conflict between the OS my Docker Engine was running on (Alpine) and kubeadm bootstrapping.

because these nodes are running within Docker, they need to volume mount the cgroup mounts on the host within the container so that containers created within Pods receive the correct control groups. however, the location of these cgroup mounts can vary depending on whether the host is using systemd to manage cgroups or not.

Alpine uses openrc instead of systemd and has non-conventional mount points for cgroups. this trips up the kubelet and prevents it from starting.

this is further complicated by the fact that (a) cluster-api uses ubuntu in their end-to-end tests, and (b) they don’t matrix their tests to account for other operating systems (or at least I didn’t see any evidence of this from looking at their GitHub workflows.)

lots of version incompatibilities

backwards compatibility does not seem to exist between cluster manifests generated by versions of clusterctl that are older than the version of capd-controller-manager. (i.e. a cluster manifest generated by 1.2.5 of clusterctl will be accepted by 1.2.5 of capd-controller-manager. however, a cluster manifest generated by 1.2.4 of clusterctl will cause unpredictable behavior when submitted to 1.2.5 of capd-controller-manager.)

i noticed this while trying to get a working control plane provisioned on a Ubuntu machine running the Docker Engine. in this scenario (kind cluster running v1.23, clusterctl version 1.2.5, requesting a kubernetes cluster running 1.22.1), the kubelet within capd-provisioned nodes would not start due to the kube-apiserver container not starting. logs coming from the failing kube-apiserver container showed that it was failing to start due to an unknown resource PodSecurityConfiguration being created during Pod admission.

Looking at the Pod configuration for the kube-apiserver showed that it mounted base manifests from /etc/kubernetes/manifests on the node. Sure enough, there was a manifest in this folder that defined a PodSecurityConfiguration within an AdmissionConfiguration resource inside of an API pod-security.admission.config.k8s.io/v1beta1 that did not ship with 1.22.1 (this was actually in v1 of that API).

(I knew to look at the kube-apiserver Pod configuration because the systemd unit for kubelet showed that it loaded static pods within the /etc/kubernetes/manifests folder)

As it happens, this admission configuration was created as a result of this PR…which assumed a later version of Kubernetes.

TL;DR: I should have read this before arbitrarily deciding to use k8s v1.22.1 for everything…and running containerd/Docker Engine on Alpine is officially a Bad Idea™)