Kubernetes deep dive part 2 -not all ideas that seem good at the start end up being good….

After a week of playing around, tinkering with stuff, I decideded to let my traefik instance be highly available, so that I could restart it without my web services being down. That led to a lot of discoveries and a lot of reconcidering of concepts. Rather than jumping to the conclusions, I’ll let you follow my steps along the way of my learning path.

When you want redundant containers, you usually need some kind of load balancing. The services are just that, some kind of load balancers. I’d so far only used them internally, and let one traefik instance handle the ingress from the outside world. That is the default type of Service, and it’s called a ClusterIP type. As the name indicates, it will give you an IP address from your cluster that other parts of the cluster can find (via Kubernetes internal DNS) and connect to. When you want to access the services from the outside world, you need to use another type.

The NodePort Service type is quite simple. It just exposes the service on a port on the node it’s running. Since my workloads are more external-facing, and not being used by other things on the node, I decided to go with a LoadBalancer service type.

There is quite a few load balancers to choose about. K3s has an internal one, ServiceLB, but I decided to go with MetalLB, that is commonly used when running Kubernetes on your own server and not for example in AWS. To do that, you’ll first have to get ServiceLB out of the way – they can’t coexist. To do that, you’ll need to disable it with a startup flag to k3s:

# sudo vi /etc/systemd/system/k3s.service

You’ll need to add to the startup arguments, approximately like this (leave all other startup arguments in place):

ExecStart=/usr/local/bin/k3s \
    server \
	.....
        '--disable=servicelb' \

Then you’ll need to install MetalLB. There’s a few installation options, but having gotten ServiceLB out of the way, I decided to do it simple with helm:

$ kubectl create namespace metallb-system
$ helm repo add metallb https://metallb.github.io/metallb
$ helm install -n metallb-system metallb metallb/metallb

Then you’ll need a pool of addresses for load balancers. Make sure it doesn’t overlap with anything, MetalLB will not be able to discover that, since we’re now dealing with ip addresses on the outside of kubernetes.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: public-ip-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.29.5-192.168.29.20

Note that you still need your kdmz01 with 802.1q setup, as these addresses still live on the DMZ – and you need to be able to reach it from the outside world.

For the outside world to know about this, you’ll need to make sure that load balancers are announced to the outside world. You have a choice between BGP and ARP as methods, but as I’m on a single-node setup, ARP is the simple way.

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: public-l2-adv
  namespace: metallb-system
spec:
  ipAddressPools:
     - public-ip-pool

And now I’m ready to create my first load balancer:

apiVersion: v1
kind: Service
metadata:
    name: traefik-lb2
    namespace: traefik
    annotations:
      metallb.universe.tf/address-pool: public-ip-pool
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  loadBalancerIP: 192.168.29.7
  externalIPs:
    - 192.168.29.7
  ports:
    - name: web
      port: 80
    - name: websecure
      port: 443
  selector:
    app: traefik

The load balancer service will, as the other services, connect to the internal IP address of the backends, and not the IP addresses I’d set up with macvlan and multus. I still initially thought multus and macvlans would be a part of my setup, but since MetalLB doesn’t do any NAT of any sort, I ended up with the return traffic going out on the macvlan, and not back out through the load balancer! Asynchronous routing is bad on multiple levels, and here it made things not work at all. So multus had to go from traefik.

It’s also about time to make traefik redundant. I could scale up my Deployment to have more than one replica, but since I have persistent storage, this comes with complications. I decided to just have each traefik instance handling it’s own certificates and be completely self-contained, with it’s own volume. There’s another kind of redundancy mechanism that’s much more suited in this case, the StatefulSet. A statefulset will make the pods able to create their own volumes, which is what I wanted. Other than that, it resembles my previous Traefik Deployment quite a bit.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: traefik
  namespace: traefik
  labels:
    app: traefik
spec:
  serviceName: traefik
  replicas: 2
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
    spec:
      containers:
      - name: traefik
        image: traefik:v3.3.2
        args:
          - "--api.insecure=true"
          - "--accesslog=true"
          - "--providers.kubernetescrd"
          - "--providers.kubernetescrd.allowCrossNamespace=true"
          - "--providers.kubernetesingress"
          - "--entrypoints.web.address=:80"
          - "--entrypoints.websecure.address=:443"
          - "--certificatesresolvers.letsencrypt.acme.email=vegard@engen.priv.no"
          - "--certificatesresolvers.letsencrypt.acme.storage=/data/acme.json"
          - "--certificatesresolvers.letsencrypt.acme.dnschallenge.provider=linode"
          - "--certificatesresolvers.letsencrypt.acme.dnschallenge.delaybeforecheck=10"
          - "--certificatesresolvers.letsencrypt.acme.dnschallenge.resolvers=8.8.8.8:53,1.1.1.1:53"
          - "--metrics.prometheus=true"
          - "--metrics.prometheus.entryPoint=metrics"
        env:
          - name: LINODE_TOKEN
            valueFrom:
              secretKeyRef:
                name: linode-dns-token
                key: LINODE_TOKEN
        ports:
          - name: web
            containerPort: 80
            protocol: TCP
          - name: websecure
            containerPort: 443
            protocol: TCP
        volumeMounts:
          - mountPath: "/data"
            name: traefik-volume
      volumes:
        - name: traefik-volume
          persistentVolumeClaim:
            claimName: traefik-volume
  volumeClaimTemplates:
  - metadata:
      name: traefik-volume
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi  # Adjust size as needed
      storageClassName: zfs-storage-nas

The main difference is that instead of referencing pre-created PVCs, you can create them on the fly. The rest of the configuration I have covered in part 1

This was actually pretty easy, and looks like this when inspecting it:

$ kubectl get pods -n traefik 
NAME        READY   STATUS    RESTARTS   AGE
traefik-0   1/1     Running   0          24h
traefik-1   1/1     Running   0          24h

$ kubectl get pvc -n traefik
NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE

traefik-volume-traefik-0   Bound    pvc-c0c26955-ccfc-44a7-87ee-45d5671f7e00   1Gi        RWO            zfs-storage-nas   <unset>                 4d9h

traefik-volume-traefik-1   Bound    pvc-7b45ffcd-d76f-457d-ba6d-ad962960180e   1Gi        RWO            zfs-storage-nas   <unset>                 4d9h

As you can see, we’re just getting numbered resources, and it’s easy to scale up to even more instances should we need. Running on a single node, that’s probably not needed.

The MetalLB loadbalancer looks like this:

$ kubectl describe service -n traefik traefik-lb2
Name:                     traefik-lb2
Namespace:                traefik
Labels:                   <none>
Annotations:              metallb.io/ip-allocated-from-pool: public-ip-pool
                          metallb.universe.tf/address-pool: public-ip-pool
Selector:                 app=traefik
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.235.237
IPs:                      10.43.235.237
External IPs:             192.168.29.7
Desired LoadBalancer IP:  192.168.29.7
LoadBalancer Ingress:     192.168.29.7 (VIP)
Port:                     web  80/TCP
TargetPort:               80/TCP
NodePort:                 web  32516/TCP
Endpoints:                10.151.253.201:80,10.151.253.196:80
Port:                     websecure  443/TCP
TargetPort:               443/TCP
NodePort:                 websecure  32605/TCP
Endpoints:                10.151.253.201:443,10.151.253.196:443
Session Affinity:         None
External Traffic Policy:  Local
Internal Traffic Policy:  Cluster
HealthCheck NodePort:     31748
Events:                   <none>

Note: You will need one ipv4 address (and possibly an IPv6 address) set up on the interface outside of the Kubernetes setup, on the node side of this. This seems to be needed if you are using 802.1q, but it’s not well-documented anywhere. Without this, your load balancers just stops working, and it took me a while to figure out this, as things seemed to be mostly-working, just not fully and not from outside the noe.

IPV6

Having done IPV4, it was time to do IPV6. At this time, I was totally convinced that MetalLB could do NAT on the end and work with IPV4 addresses on the PODs, but it can’t. I wasted a lot of time trying to do this. There’s other kinds of load balancers that might do this, like LoxiLB. Since I am a perfectionist, I want full IPV6, and since I generally dislike NAT, I decided to try have my PODs also have IPV6 addresses….

I am by nature curious. I tend to jump into things, head-first. Sometimes, it pays off. Sometimes, I have to trace back and redo things, but usually not before learning some stuff. This was one such case: At least in K3s, you simply can not switch to dualstack after the fact. So I needed to actually uninstall K3s and reinstall.

To have K3s be dualstack, you’ll need to start it the first time with flags like this (generally set in k3s.service as we did for disabling ServiceLB):

        '--cluster-cidr=10.42.0.0/16,fd42::/48' \
	'--service-cidr=10.43.0.0/16,fd43::/112' \

Here I have opted for using IPV6s Unique Local Addresses, which is IPv6’s version of private addresses. I could have decided to give it parts of my ISP-provided /56 network, but it’s not possible to change it post-fact, and my ISP might decide for reasons to give me a new IPV6 prefix, and I’d be in deep trouble then. So if you are on ISP-provided (i.e. with DHCPv6) IPv6 addresses, go with ULA here, even if your ISP promise it won’t change. They might also do mistakes and lose your assignment.

Having done this, I needed to reinstall everything, but mostly as I did previously, so I won’t go into detail here.

Here, it paid off to keep my YAMLs neatly organized for documentation purposes, so I could just apply them. While it would likely have been possible to reuse the same storage volumes, I ended up getting new ones. The old ones are still there since I have chosen Retain policy on my storage classes. That means kubernetes not deleting any volumes, but it does mean I occationally need to clean up some volumes I no longer need.

Then, I needed to enable IPv6 in Calico. Spare yourself some trouble and do this at install time of Calico, as that’s another reinstall. It’s possible to do it in a running cluster, though, but all your PODs will have to be restarted to get IPv6 addresses.

I struggled a lot with this. Calico can be a beast, in my opinion, it goes out of its way to make sure things stay consistent, it will often revert whatever changes you do, so it is wise to do this correct first. So do yourself a service and try to follow the official guide as you install Calico. I didn’t, but this time I’ll spare you from my troubles.

Once Calico is installed with IPV6 support, you need to add an IPv6 pool, if it’s not added already:

apiVersion: projectcalico.org/v3
kind: IPPool
name: default-ipv6-ippool
spec:
  allowedUses:
  - Workload
  - Tunnel
  blockSize: 122
  cidr: fd00::/64
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Never

Once you start or restart PODs, you should start to see IPv6 addresses as you describe them:

$ kubectl describe pod -n traefik traefik-1 
Name:             traefik-1
Namespace:        traefik
Priority:         0
Service Account:  default
Node:             hassio/192.168.1.153
Start Time:       Sat, 22 Feb 2025 11:16:03 +0100
Labels:           app=traefik
                  apps.kubernetes.io/pod-index=1
                  controller-revision-hash=traefik-f7f557d45
                  statefulset.kubernetes.io/pod-name=traefik-1
Annotations:      cni.projectcalico.org/containerID: 171e65d2a81cf9613e46ee1919991d500b45f2b6273b26c7e7e1b64a2d022ee3
                  cni.projectcalico.org/podIP: 10.151.253.201/32
                  cni.projectcalico.org/podIPs: 10.151.253.201/32,fd00::b8f3:e78:1a73:9c74/128
                  kubectl.kubernetes.io/restartedAt: 2025-02-22T11:15:52+01:00
Status:           Running
IP:               10.151.253.201
IPs:
  IP:           10.151.253.201
  IP:           fd00::b8f3:e78:1a73:9c74

And at this point, you are ready to tackle MetalLB load balancers with IPv6 support.

You first need an IPv6 MetalLB pool:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: public-ipv6-pool
  namespace: metallb-system
spec:
  addresses:
    - 2001:db8:123:f10b:2::/80

Then you’ll need to update the MetalLB config:

kind: ConfigMap
metadata:
  name: config
  namespace: metallb-system
data:
  config: |
    address-pools:
      - name: public-ipv4-pool
        protocol: layer2
        addresses:
          - 192.168.29.5-192.168.29.20
      - name: public-ipv6-pool
        protocol: layer2
        addresses:
          - 2001:db8:123:f10b:2::/80

If it does not exist, create it.

My DMZ network is 2001:db8:123:f10b::/64 (as assigned by my router and ISP). I haven’t found a good way to make this dynamic, so if your ISP changes this, there will be changes needed….

Then, I need to update the l2 advertisement:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
    name: public-l2-adv
    namespace: metallb-system
spec:
  ipAddressPools:
  - public-ip-pool
  - public-ipv6-pool

And we should at this point be ready for our first IPv6 load balancers. I have opted for single stack load balancers. I actually give each of my virtual hosts it’s own IPv6 load balancer, but as I have only one external IPv4 address, I can’t do that in IPV4. This is eye candy only, I like it that way, but it doesn’t do much functional difference, really. But here’s my first IPv6 load balancer for this blog:

apiVersion: v1
kind: Service
metadata:
    name: traefik-vegardblog
    namespace: traefik
    annotations:
      metallb.universe.tf/address-pool: public-ipv6-pool
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  ipFamilyPolicy: SingleStack
  ipFamilies:
    - IPv6
  loadBalancerIP: 2001:db8:123:f10b:2::1
  ports:
    - name: web
      port: 80
    - name: websecure
      port: 443
  selector:
    app: traefik

Make note that the IPv6 address of course needs to be from the pool. I have opted to hard code them, as I don’t want them to change – even if I should delete and recreate a load balancer. This is where you point your DNS entries, of course, and it’s an unneeded complexity if you have to update those – and potentially your firewall

If all is good, your traefik service should now be a HA setup, and be accessible both via IPv4 and IPv6!

Vegards Blog

Kubernetes deep dive part 2 -not all ideas that seem good at the start end up being good….

IPV6

Legg igjen en kommentar Avbryt svar

Kubernetes deep dive part 2 -not all ideas that seem good at the start end up being good….

IPV6

Del dette:

Legg igjen en kommentar Avbryt svar