Setting up a second DR cluster part 1 – bootstrap script.


My primary cluster was installed manually. While I have retroactively created some scripts and put it in my bootstrap repository, it had never been tested end-to-end. Creating a DR cluster (still only one node) was a perfect opportunity to test that.

The goal of the bootstrap repo/script is to have scripted procedure to install the components needed for my ArgoCD to work and deploy the rest of my workload.

I also needed to have my repo being configured for two different clusters, running the same basic platform, but with configuration potentially varying slightly. SInce my DR cluster runs in a completely different network, for example my externally exposed IP addresses/load balancer pools will vary. Another thing that is different is the disk layout, and it’s also not so powerful – which I can potentially change if I end up having to run DR for my whole workload for an extended time, since it’s a cloud server.

But back to my installation script. After a whole lot of reinitalizations, this is my master installation script:

#!/bin/sh
rm -rf /etc/rancher /etc/cni /var/lib/rancher /var/lib/kubelet
dir=`dirname $0`
cd $dir
$dir/install-k3s.sh
$dir/install-calico.sh
$dir/install-sealedsecrets.sh
$dir/install-argocd.sh
nodename=`uname -n`
env=`grep $nodename nodes.txt | awk '{print $2}'`
kubectl label secrets -n argocd in-cluster env=$env
kubectl apply -f $dir/../application/argocd/base/calicopolicy.yaml
kubectl create -f $dir/../application/bootstrapappset.yaml

The first rm is to clean up from my previous installation attempts.

The rest of it, I’ll go through step by step.

Installing k3s

#!/bin/bash
set -e

curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_ENABLE=true sh -

mkdir -p /etc/rancher/k3s
cp "$(dirname "$0")/k3s/config.yaml" /etc/rancher/k3s/config.yaml
cp "$(dirname "$0")/k3s/audit-policy.yaml" /etc/rancher/k3s/audit-policy.yaml

cp "$(dirname "$0")/k3s/k3s.service" /etc/systemd/system/k3s.service

systemctl daemon-reexec
systemctl daemon-reload
systemctl enable --now k3s

I have pre-created and check in my standard config.yaml and audit-policy.yaml, my modified k3s.service, and checked them into the repo. This basically installs and starts k3s with my customized configuration.

Install Calico

My installation comes with the built-in networking layer disabled, so for anything to work at all, I’ll really need to install a networking layer.

#!/bin/sh
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.1/manifests/operator-crds.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.1/manifests/tigera-operator.yaml
kubectl create -f ../application/calico/base/installation-default.yaml
kubectl create -f ../application/calico/base/apiserver.yaml

Again this took some trial and error. The installation-default.yaml is the one I have checked into my repo and manages there, but I’ll need to install it beforehand. The same with the calico apiserver, which didn’t get provisioned with the previous steps.

Install sealed-secrets.

For even ArgoCD to work, I need to have sealed secrets working. It has some secrets, among other the ssh keys, that is needed to access the repositories with its configuration. Those are checked in to the repository, but encrypted with my main clusters encryption keys.

set -e

NAMESPACE=sealed-secrets
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm repo update

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -

echo "**** Sync the keys!"

read a


helm install sealed-secrets sealed-secrets/sealed-secrets \
--namespace $NAMESPACE \
-f sealed-secrets-values.yaml

The bold part is the only manual part in my installation. The keys lives in the main cluster, and manually need to be copied over to the new cluster upon for the new cluster being able to decrypt secrets checked in to the repo. In fact, new secrets needs to be synced over periodically, this will be a manual maintenance step. Per default, you will get a new key every month, but old secrets are of course encrypted by the old secrets. Unless you take steps to reencrypt, you need all the keys:

kubectl get secrets -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key==active -o yaml

This is basically the keys to your vault and should be handled with care. In fact, I piped it directly into ssh and added it directly in to the cluster:

kubectl get secrets -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key==active -o yaml | ssh -i .ssh/id_rsa.key user@drnode 'kubectl apply -f  -'

This is safe to rerun periodically, the old ones will fail but new keys will apply cleanly.

Having copied over the keys, I can continue the scripts and have sealed-secrets installed.

Install ArgoCD

With all of this installed, I am ready to install ArgoCD and provision infrastructure and applications.

$ less install-argocd.sh 
#!/bin/bash
set -e

NAMESPACE=argocd
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

helm upgrade --install argocd argo/argo-cd \
--namespace $NAMESPACE \
-f argocd-values.yaml

kubectl apply -f ../application/argocd/base/bootstrapkey.yaml
kubectl apply -f ../application/argocd/base/argo-repo-creds-sealed.yaml
kubectl apply -f ../application/argocd/base/service-account-and-role.yaml


SA=argocd-manager
NS=argocd
SEC=$(kubectl -n $NS get sa $SA -o jsonpath='{.secrets[0].name}')
TOKEN=$(kubectl -n $NS get secret $SEC -o jsonpath='{.data.token}' | base64 -d)
CA_B64=$(kubectl -n $NS get secret $SEC -o jsonpath='{.data.ca\.crt}')

cat <<EOF | kubectl -n argocd apply -f -
apiVersion: v1
kind: Secret
metadata:
name: in-cluster
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
stringData:
name: in-cluster
server: https://kubernetes.default.svc
config: |
{
"bearerToken": "${TOKEN}",
"tlsClientConfig": {
"insecure": false,
"caData": "${CA_B64}"
}
}
EOF

This step also took some trial and error.

THe bootstrapkey is the SSH key giving access to the bootstrap repository, which lives in github.

The service-account-and-role is this, and I really only need it for this in-cluster secret stuff. I basically asked ChatGPT about this. It’s been fairly good at providing suggestions, but not perfect. This one worked, though 🙂

apiVersion: v1
kind: ServiceAccount
metadata:
name: argocd-manager
namespace: argocd
---
# 2) RBAC (tighten rules as you wish; below is the broad/default style)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: argocd-manager-role
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
- nonResourceURLs: ["*"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: argocd-manager-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: argocd-manager-role
subjects:
- kind: ServiceAccount
name: argocd-manager
namespace: argocd
---
# 3) Long-lived token Secret *skeleton*
# K8s >=1.24 will populate the token for this SA after you apply this file.
apiVersion: v1
kind: Secret
metadata:
name: argocd-manager-token
namespace: argocd
annotations:
kubernetes.io/service-account.name: argocd-manager
type: kubernetes.io/service-account-token

The in-cluster secrets is a secret that tells argocd about how to manage the cluster. Most of this isn’t really used, so I haven’t really verified that everything is correct, because since I use an ArgoCD in each of my clusters instead of having my main ArgoCD manage both, I have different ways to authenticate to kubernetes. However, the secret needs to be there because I need to attach a label to it that differ on each node:

nodename=`uname -n`
env=`grep $nodename nodes.txt | awk '{print $2}'`
kubectl label secrets -n argocd in-cluster env=$env

The nodes.txt is simply a small file with lines with «<nodename> <environment>«. This environment is used when provisioning ArgoCD to make it possible to make ArgoCD different things to my different clusters based on this cluster. I have env=prod for production and env=dr for DR.

Then, we are ready to provision our applications. I still need one little cheat:

kubectl apply -f $dir/../application/argocd/base/calicopolicy.yaml

This is because my calico application is set to sync before my argocd application. My calico application sets a default-deny policy, basically making argocd stop working as there’s no network policy that allows it to do what it needs to do….

So I just pre-apply the one from the argocd application.

As you can see, there’s a whole lot of chickend-and-egg-problems here, but methodical testing when installing (and reinstalling!) the DR node made it possible to create scripts making it work for me. It’s not particulary elegant, but it works.

The last step in the install script to to install the first ArgoCD applicationset. An applicationset is an ArgoCD construct to make it possible to create multiple applications with slightly varying config. In my case, I use the tag on the in-cluster secret to create only DR applications on the DR node, and only prod applications on the PROD node.

kubectl create -f $dir/../application/bootstrapappset.yaml

But building the application sets will be the topic of my next blog post! Stay tuned!

,

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *

This site uses Akismet to reduce spam. Learn how your comment data is processed.