Kubernetes DR Part 2: Building infrastructure applicationsets for my clusters.

Having installed a new cluster including getting ArgoCD to run, it’s time to think about deploying some applications. From earlier, I have simple applications that are very much tailored to running on my already existing cluster:

ZFS Storage classes with paths that matches my disk layout
loadbalancer ip ranges for my home network
external-dns entries setting IPv4 entries to my outside IP address
Firewall entries configured with my Unifi Network Operator

Setting up a DR cluster, I need to make my DR cluster creating similar applications with some properties changed. And this is where ApplicationSets become useful.

An applicationset is basically a way to template several application that should be similar but have slight variations. There are multiple use cases, but the one I am after creating a DR version and a PROD version of my applications. In additions, I want my PROD node to only run the PROD applications and my DR node to only run my DR applications. And this is where the cluster secret label from the previous article is useful, ArgoCD knows whether it’s a prod or a DR node and can act upon that.

At this point, I think it’s easier with an example, so we’ll start with one of the very earliest things I need: The zfs-localpv operator and the ZFS Storage Classes that are managed by it.

Creating my first Application Set

kind: ApplicationSet
metadata:
  name: zfs
  namespace: argocd
spec:
  goTemplate: true

  # We use two matrix generators (one for prod, one for dr).
  # Each matrix combines:
  #   - a clusters generator that matches the env label
  #   - a list generator that supplies appName/appPath/namespace
  generators:
    # ===== PROD =====
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  env: prod
          - list:
              elements:
                - appName: zfs-prod
                  appPath: application/zfs-storage/prod
                  namespace: openebs
                  env: prod

    # ===== DR =====
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  env: dr
          - list:
              elements:
                - appName: zfs-dr
                  appPath: application/zfs-storage/dr
                  namespace: openebs
                  env: dr

  # One shared template that uses the params provided by the matrices above
  template:
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "0"
      name: "{{ .appName }}"
      labels:
        app.kubernetes.io/name: argocd
        env: "{{ .env }}"
    spec:
      project: default
      source:
        repoURL:  git@github.com:vegardengen/kubernetes-bootstrap.git
        targetRevision: main
        path: "{{ .appPath }}"
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{ .namespace }}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

The generators is what enumerates which applications to create. There is a number of them, but I use the matrix generator to combine the cluster generator and the list generator. This ended up being pretty verbose, could probably be refined to make less duplicated statements, but it works for me. In a more enterprisey setup than mine, this would probably even be created automatically with tools like terraform and/or ansible, but for now I rely on just creating them manually, and since they are pretty standardized in my setup, I usually just copy a previous one, does a search/replace on the application name and the namespace, and then I have the base of a new applicationset.

The matrix generator is the one driving which applications to generate. The matchLabel on the cluster generator makes sure that the only those applications from the list generator that belongs on the cluster the ArgoCD installation is running is created, in effect creating DR applications on the DR node and PROD applications on the PROD node.

The elements on the list generators are the potentially varying properties in the applications the applicationset should generate. As you can see, the appName is different. I could probably have made it the same, but keeping it different may make it easier later if I want to test the other kind of setup, where one ArgoCD setup manages both clusters. And then, the appPath is different, to fetch the source from different paths in the repository.

In each of dr and prod, we have a kustomization.yaml that define the properties. In the case of this application, this file is identical:

kind: Kustomization
resources:
  - ../base
  - zfs-storageclass-nas.yaml
  - zfs-storageclass-znvm.yaml

They include the same common part, which lives in ../base (relative to the application path), and this common path includes the resources that should be identical between DR and prod. In this case, it’s a helm installation of zfs-localpv plus one other resource, a snapshot, so the applications/zfs-storage/base/kustomization.yaml has this content:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization


helmCharts:
  - name: zfs-localpv
    repo: https://openebs.github.io/zfs-localpv
    releaseName: openebs-zfs
    version: 2.8.0
    namespace: openebs
    valuesFile: zfs-localpv-values.yaml

resources:
  - snapshotclass.yaml

But the storage classes are different, as they live in DR and PROD respectively, so here’s the ones for my zfs-storage-nas storageclasses.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: zfs-storage-nas
provisioner: zfs.csi.openebs.io
allowVolumeExpansion: true
parameters:
  poolname: "nasdisk/k3s"
  fstype: "zfs"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

In DR, only the poolname is different:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: zfs-storage-nas
provisioner: zfs.csi.openebs.io
allowVolumeExpansion: true
parameters:
  poolname: "backup/nasdisk/k3s"
  fstype: "zfs"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

The backup prefix is simply because that’s where I had space for the storage, in my backup zfs pool. which I also uses for my backup of my prod servers ZFS volumes. For a real professional setup, this is an unacceptable mix of roles, but I am doing this on a hobbyist budget, so there’s some pragmatic decisions that’s bound to happen.

The template section is the one defining the properties on the Application ArgoCD resource, and as shown, appName and appPath differs. The env label is part of the magic making only DR resources be created on the DR node, it matches the selectors in the cluster generators in the ApplicationSet.

The effect of this applicationset is that in the ArgoCD installation on my PROD node, a zfs-prod application is created, while on my DR node, a zfs-dr application is created, with the helm installation and the snapshotclass being identical but the storage classes having different definitions.

But this was only one applicationset….

On a level above all my bootstrap repo applicationsets, I have a bootstrap applicationset which generate two app-of-apps, one on each node: bootstrap-prod and bootstrap-dr. While defining them, I populated only the DR application, leaving bootstrap-prod as an empty application, since the applications existed before in a non-applicationset context. THen, I carefully added the sub-applicationsets to prod, one by one, while deleting the old previously setup separate applications from the repo. This made the zfs application a perfect start, as it really only does anything when I provision new ZFS PVCs, so messing it up on prod wouldn’t have been a disaster, and it’s also pretty simple.

For longhorn, the setup was more or less the same, although longhorn is a bit more complex and needed more variable properties.

For calico, I needed the load balancer ip pools to be different, other than that the properties were more or less the same.

For traefik, there was basically only some networking related variations, but the rest was similar.

For minio and gitea, I also needed externaldns, which, since I was managing the same zone, gave me some nasty surprises. externaldns will enumerate all the resources it manges in the cluster, then compare it with the externaldns-managed resources in the external DNS zone, potentially deleting the ones it no longer should manage and creating new ones…..

Well, so when I fired it up, it happily, went along and deleted all the DNS entries created by my production cluster, because it thought it was supposed to manage them. Ouch!

Then came production externaldns, seeing «oh! I need to create some DNS records here!», putting them back in place. And DR externaldns would once again find entries to remove…

The solution to this little gotcha was to use a different owner id (an external-dns property that is added into metadata in TXT records in the external DNS zone) between DR and prod, and then DR would find out that it didn’t own the other entries and stay away from them, and I was able to create separate minio-dr and gitea-dr records in my DNS zone while leaving the others alone.

All of this took a while, but in the end, I had a bootstrap repo with applicationsets instead of the old applications, creating both prod and dr ArgoCD applications – up to and including gitea.

For gitea, I pulled in the production longhorn volumes to create new volumes in DR with the same content, as I described how to do in a previous blog post. I used CloudnativePGs mechanisms to create a database based on the production backup.

I also briefly was on a setup where I synchronized all the longhorn backups between my on-prem minio and my DR minio. In this process, the I/O of my nodes quickly become a bottleneck, affecting the stability of the my nodes, so I ended up changing to sending the longhorn backups directly to a third party S3-compatible storage solution, and then having DR pull in those backups directly. This was much nicer to my clusters who was now able to do something else except pushing backups over the network…

In the end, I ended up with basically the same applicationsets as I had applications in the bootstrap repo, except that I ended up discovering that some things, like externaldns, was wise to move to the bootstrap repo, since gitea and argocd depended on it. I won’t go into any detail of the rest of my applicationsets, as the process of migrating them are more or less the same, except that what needs to vary between DR and PROD might be different for each application.

Once I had the DR gitea provisioned with the same repos as we had in prod gitea, I could start on the task of migrating my gitea-hosted ArgoCD applications to applicationsets, but this will be the topic of part 4 of this series.

Vegards Blog

Kubernetes DR Part 2: Building infrastructure applicationsets for my clusters.

Creating my first Application Set

But this was only one applicationset….

Legg igjen en kommentar Avbryt svar

Kubernetes DR Part 2: Building infrastructure applicationsets for my clusters.

Creating my first Application Set

But this was only one applicationset….

Del dette:

Legg igjen en kommentar Avbryt svar