Kubernetes at home for fun and absolutely no profit.

Disclaimer: Quite a bit of this is outdated. After a week or so, I decided to redo it all – mainly because I wanted ipv6 supported inside, and I thought that the fact that my multus macvlans supported ipv6 was a proof I was all good. Turns out that with multus and macvlans, you live quite a bit on the outside of kubernetes concepts, and even the built in network security mechanisms doesn’t apply to it. I decided to redo it with ipv6 and without multus – because at least in k3s, you simply can not enable ipv6 after having created the cluster. Do not do the same mistake as me, go for ipv6 from the start. IPv4 is legacy, it’s high time we start treating IPV6 as a first class citizen.

After my docker networking deep dive, I decided to explore another container orchestration system, Kubernetes. Kubernetes is more feature-rich, has more built in security and isolation mechanisms, and comes with more networking options built in, but has somewhat of a learning curve. For a nerd, that might be seen as an added bonus, there’s tons to learn! In this post I focus mostly about my choices and the concepts, while I’ll go more in depth into different concepts later. To give a full overview of my setup, I’ll skip out on a bit of the installation/configuration details, which I guess I’ll correct somewhat in followup posts. If you want me to go in to details in some of this, either be patient and I might get around to it, or leave a comment (which I’ll need to approve. There’s way too much comment spam going around, so I won’t leave it unmoderated).

Kubernetes Implementation

There’s a few different implementations of Kubernetes. The real deal is K8s, and then it’s down to more like overlays over docker like Minikube. Since my infrastructure is pretty limited, but I still wanted a «real deal»-feeling to it, I chose K3s, which is fairly lightwight but still gives you a good experience tailored for more limited production workloads. Installing it is fairly simple, with one installation script that just tend to work.

Networking

The next thing to choose is how to manage networking. Kubernetes has the concept of Container Network Interface, that is plugins that manages the networking for you. K3s comes with Flannel as the default CNI, which is simple and might have done the job. But since I am on a learning trip, I decided to choose something a bit more advanced, so I went with Calico. It is more feature-rich, comes with better security features, and all sorts of fun stuff to play with. Eager to get going, I didn’t dive too deep into this. There’s also options like Cilium, plus likely several others.

I wanted to continue with my strategy of one outside-facing network where my DMZ-facing applications can be exposed and one inside-network which is Kubernetes only. Cilium can’t deliver that natively, and the solution to that is usually Multus, a CNI that you’ll use as an addon to your main CNI. Multus enables you to have multiple network interfaces in a POD. My internal jury is still out on to what extent this is a good idea, but so far I run a pretty similar setup like in my docker networking series, with an incoming DMZ (a multus network) and outgoing DMZ (another multus network) where containers that just should be allowed to speak to the world can attach. I chose to stick with macvlan on top of the 802.1q interfaces, as that gives a good isolation between the host and the kubernetes network.

hassio% ifconfig kdmz01 
kdmz01: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::1e69:7aff:fe64:12e1  prefixlen 64  scopeid 0x20<link>
        inet6 2001:db8:123:f10b:1e69:7aff:fe64:12e1  prefixlen 64  scopeid 0x0<global>
        ether 1c:69:7a:64:12:e1  txqueuelen 1000  (Ethernet)
        RX packets 1304998  bytes 223638900 (223.6 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1200547  bytes 1075326210 (1.0 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

hassio% ifconfig kdmzout
kdmzout: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::1e69:7aff:fe64:12e1  prefixlen 64  scopeid 0x20<link>
        inet6 2001:db8:123:f10c:1e69:7aff:fe64:12e1  prefixlen 64  scopeid 0x0<global>
        ether 1c:69:7a:64:12:e1  txqueuelen 1000  (Ethernet)
        RX packets 15518299  bytes 38629815999 (38.6 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11650235  bytes 18813143467 (18.8 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Then, I need to define the networks in Kubernetes. In Kubernetes, you do that through a NetworkAttachmentDefinition. My incoming DMZ is defined with this configuration, that I apply the standard way with kubectl apply -f dmz.yaml:

hassio% cat dmz.yaml 
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: dmz-net
  namespace: traefik
spec:
  config: '{
    "cniVersion": "0.4.0",
    "type": "macvlan",
    "master": "kdmz01",
    "mode": "bridge",
    "ipam": {
      "type": "host-local",
      "ranges": [[{ "subnet": "192.168.29.0/24", "rangeStart": "192.168.29.10", "rangeEnd": "192.168.29.250", "gateway": "192.168.29.1" }]],
      "default-gateway": [ "192.168.29.1" ]
    }
  }'

I have another similar that I call dmz-out, with the range 192.168.31.0/24.

A Pod will per default end up with a main interface managed by Calico, we can attach this network to a pod in addition in its configuration – which Multus will handle. We’ll cover that later.

Kubernetes will per default give you a fairly large and fairly flat network, managed by your CNI. I have created a pool, 10.151.0.0/16, that Calico will manage. For the internal network I saw no reason so far to go with static IP addressed but rather let Calico hand it out on its own, as Kubernetes has built in DNS that will look up the addresses that way instead of you using the addresses directly. The traditional security model of Kubernetes is not so much around network separation that I did play with in docker, you’ll rather attach network policies to a group of resources that is dynamically handled by your CNI. One of the key concepts around structuring this is, however, namespaces.

Namespaces

A namespace is a grouping of resources. You can loosely compare it to a docker stack, in that you can install and name things in it without much regard to what exists in other namespaces. There’s by default no security implications in separating things in namespaces, but you can use a namespace as a grouping unit in a security rule, so it still helps. Using my previous example of this blog, it is now installed in the namespace wordpress.

I decided to go with a different reverse proxy, Traefik, mainly because I wanted to try something different than nginx, and because it’s feature-rich and flexible. I run Traefik in its own namespace, traefik, and with a connection both to the internal network and to the DMZ, much like I did in my docker setup.

Pods

Pods is where you run your containers. Typically there will be one container per pod, and you’d start/stop the pod, not the containers. There’s exceptions where you’d have several containers in a pod, but those I’ll cover later – or in the next post, I have a feeling there will be several posts…

In my wordpress namespace, I have two pods, one for the wordpress application container and one for my mariadb container. But each of these might not be single pods, they can be multiple, for redundancy and fault tolerance, so you’ll generally specify it in a deployment, which manages a set of similar pods – and handles distribution over several nodes, restart strategy, scaling, and a ton of things I haven’t gotten to play with yet, partly because my Kubernetes cluster is a tiny Intel NUC sitting on the top of a drawer trying to look innocent because of the WAF. So far, the deployments I have specified myself all consists of a single POD, where one is terminated before the second one starts. All of these will of course needs some of permanent storage, which in Kubernetes is specified in the form of Permanant Volume Claims. That’s a topic that likely warrants more deep dive.

So, my WordPress Deployment looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    io.kompose.service: wordpress-vegard
  name: wordpress-app
  namespace: wordpress
spec:
  replicas: 1
  selector:
    matchLabels:
      io.kompose.service: wordpress-vegard
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.34.0 (cbf2835db)
        k8s.v1.cni.cncf.io/networks: '[
          { "name": "dmz-out", "namespace": "infrastructure", "ips": [ "192.168.31.13/24" ], "mac": "02:42:ac:11:00:04" }
        ]'
        mutating-webhook.network: "dmz-out"
      labels:
        io.kompose.service: wordpress-vegard
        app: wordpress
        webhook-network: "true"
    spec:
      containers:
        - env:
            - name: WORDPRESS_DB_NAME
              valueFrom:
                secretKeyRef:
                  name: database
                  key: MYSQL_DATABASE
            - name: WORDPRESS_DB_HOST
              valueFrom:
                configMapKeyRef:
                  name: wordpress-vegard-cm1
                  key: db-host
            - name: WORDPRESS_DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: password
                  key: MYSQL_PASSWORD
            - name: WORDPRESS_DB_USER
              valueFrom:
                secretKeyRef:
                  name: user
                  key: WORDPRESS_DB_USER
          image: wordpress:latest
          name: wordpress-app
          volumeMounts:
            - mountPath: /var/www/html
              name: wordpress-html
            - mountPath: /usr/local/etc/php/conf.d/upload.ini
              name: wordpress-vegard-cm1
              subPath: upload.ini
            
      restartPolicy: Always
      volumes:
        - name: wordpress-html
          persistentVolumeClaim:
            claimName: wordpress-html
        - configMap:
            items:
              - key: upload.ini
                path: upload.ini
            name: wordpress-vegard-cm1
          name: wordpress-vegard-cm1

First, I specify some metadata, how many replicas I want, some labels and annotations that you can use to to pretty cool things. The k8s.v1.cni.cncf.io/networks annotation is the one that tells multus to connect another network to it. To be able to use it in my external firewall rules if I should wish to, I hardcode the IPv4 address, while for IPv6 I let it get it with IPV6 autoconfiguration. If I hardcode the MAC address, that one will also be static, provided my network doesn’t change. I could have opted for hard coded ipv6 addresses and turn off IPv6 autoconfig, but if my ISP should decide to give me another IPv6 prefix, at least all of my Kubernetes setup will reconfigure all by itself, leaving only external firewall rules and external DNS for me to worry about….(I have some ideas how to solve that with automation, too, but that’s for another day).

Then, I specify the container. There’s some environment variables the docker image expects, then there’s the docker image itself, and then there are some persistent volumes – which is a topic worthy of a blog post in itself. There’s also two concepts that’s pretty good to know about:

configMaps, that is a place to stort application confirmation,either in form of key/value pairs or in form of full configuration files that you can mount into the runtime environment of the container. I have used both here.
secretKeys, that is sort of the same but more geared through secret values (passwords, encryption keys, etc). There’s much more to those than what I have covered so far, you can potentially encrypt them, or even store them in an external vault of your choosing. Courtesy of Keeper, I actually have Keeper Secrets Manager which I tested in my backup solution which I still run. This will allow me to test Kubernetes External Secrets Operator for Keeper, and knowing myself, I’ll get tempted to do this in the not-too-distant future.

I’ll not go into the details how you create and specify configmaps and secrets, there’s some options there.

For completion, here’s my Mariadb deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.34.0 (cbf2835db)
  labels:
    io.kompose.service: mariadb-vegard
    app: mariadb
  name: wordpress-db
  namespace: wordpress
spec:
  replicas: 1
  selector:
    matchLabels:
      io.kompose.service: mariadb-vegard
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.34.0 (cbf2835db)
      labels:
        io.kompose.service: mariadb-vegard
        app: mariadb
    spec:
      containers:
        - env:
            - name: MYSQL_DATABASE
              valueFrom:
                secretKeyRef:
                  name: database
                  key: MYSQL_DATABASE
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: password
                  key: MYSQL_PASSWORD
          image: mariadb
          name: wordpress-db
          volumeMounts:
            - mountPath: /var/lib/mysql
              name: wordpress-db
      restartPolicy: Always
      volumes:
        - name: wordpress-db
          persistentVolumeClaim:
            claimName: wordpress-db

This one is a bit simpler, as I don’t expose it to multus, it shouldn’t have a need to access the external world. However, by default it will still be able to, unless I add a network policy that blocks it – that, I’ll probably cover in a specific security related blog post….

Services

You’ll usually not configure different parts of your systems to communicate directly to other pods, there’s another abstraction layer called services. A service will usually live on a more static IP address, at least more long-lived than the containers that die and restart and gets new ip addresses all the time.

I specify two services here, my wordpress service and my mariadb service.

First, my wordpress service. Note the selector towards the bottom, it select based on pod labels, and I had app: wordpress specified in my wordpress deployment specification, so it’ll automatically find the PODs to connect to based on that label.

kind: Service
metadata:
  annotations:
  labels:
    app.kubernetes.io/name: wordpress
    app.kubernetes.io/version: 11.5.1
    helm.sh/chart: grafana-8.9.0
  name: wordpress
  namespace: wordpress
spec:
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: wordpress-app
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: wordpress
  sessionAffinity: None
  type: ClusterIP

My mariadb service is pretty similar:

apiVersion: v1
kind: Service
metadata:
  annotations:
  labels:
    app.kubernetes.io/name: mysql
    app.kubernetes.io/version: 11.5.1
    helm.sh/chart: grafana-8.9.0
  name: db
  namespace: wordpress
spec:
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: wordpress-db
    port: 3306
    protocol: TCP
    targetPort: 3306
  selector:
    app: mariadb
  sessionAffinity: None
  type: ClusterIP

Now, here’s another piece of magic. When I deploy this, I can, in my whole Kubernetes cluster, get to my wordpress instances web port by connecting to the hostname wordpress.wordpress.svc.cluster.local on port 80. WordPress will use mariadb.wordpress.svc.cluster.local:3306 for its DB connection. The second wordpress in these names is the namespace. You might of course not want it exposed clusterwide, but that you’ll handle by network policies.

There’s some bits and pieces I haven’t gone into the details about, like how to specify namespaces, configmaps, secrets and persistent volumes.

To expose it throgh traefik, I have a final piece of magic, an ingressroute, that I deploy in the traefik namespace

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: wordpress-ingressroute
  namespace: traefik
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`vegard.blog.engen.priv.no`)
      kind: Rule
      middlewares:
        - name: redirect-to-https
      services:
        - name: wordpress
          namespace: wordpress
          port: 80
  tls:
    certResolver: letsencrypt

Traefik

I have installed traefik in its own namespace, traefik. I opted to use helm to install it, this will make it easier to update:

$ helm repo add traefik https://traefik.github.io/charts
$ helm repo update
$ helm install traefik traefik/traefik --namespace traefik

Had I not been such a beginner, I’d have fed a -f values.yaml argument on it to have it automatically configured the way I wanted, but I ended up managing the Deployment Manually:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: traefik
  namespace: traefik
  labels:
    app: traefik
spec:
  strategy:
    type: Recreate
  replicas: 1
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
      annotations:
        k8s.v1.cni.cncf.io/networks: '[
          { "name": "dmz-net", "namespace": "traefik", "ips": ["192.168.29.15/24"], "mac": "02:42:ac:11:00:03" }
        ]'
    spec:
      initContainers:
      - name: setup-routes
        image: busybox
        command:
          - sh
          - -c
          - |
            ip route del default
            ip route add default via 192.168.29.1 dev net1
            ip route add 10.0.0.0/8 via 169.254.1.1 dev eth0
            ipv6_prefix=`ip -6 address list scope global dev net1 |grep inet6 | awk '{print $2}' | cut -f1-4 -d':'`
            for i in `seq 1 100`; do ip -6 address add $ipv6_prefix:1:0:0:$i dev net1;done
        securityContext:
          capabilities:
             add: ["NET_ADMIN"] 
      containers:
      - name: traefik
        image: traefik:v3.3.2
        args:
          - "--api.insecure=true"
          - "--accesslog=true"
          - "--providers.kubernetescrd"
          - "--providers.kubernetescrd.allowCrossNamespace=true"
          - "--providers.kubernetesingress"
          - "--entrypoints.web.address=:80"
          - "--entrypoints.websecure.address=:443"
          - "--certificatesresolvers.letsencrypt.acme.email=your-email@example.com"
          - "--certificatesresolvers.letsencrypt.acme.storage=/data/acme.json"
          - "--certificatesresolvers.letsencrypt.acme.dnschallenge.provider=linode"
          - "--certificatesresolvers.letsencrypt.acme.dnschallenge.delaybeforecheck=10"
          - "--certificatesresolvers.letsencrypt.acme.dnschallenge.resolvers=8.8.8.8:53,1.1.1.1:53"
        env:
          - name: LINODE_TOKEN
            valueFrom:
              secretKeyRef:
                name: linode-dns-token
                key: LINODE_TOKEN
        ports:
          - name: web
            containerPort: 80
            protocol: TCP
          - name: websecure
            containerPort: 443
            protocol: TCP
        volumeMounts:
          - mountPath: "/data"
            name: traefik-volume
      volumes:
        - name: traefik-volume
          persistentVolumeClaim:
            claimName: traefik-volume

As with wordpress, I expose it on a macvlan. Here, I have shown a piece of magic I left out in wordpress. I use it there too, but I have some secret sauce that adds it (did I tell you there’s material for more blog posts here too?)

The «init-container» section is a special container that runs before the traefik container starts. I have given it more rights, so that it is able to change i.e. networking configuration, which even root isn’t allowed to do inside a container. In my docker networks I solved the same things through docker events, but this is a more integrated and elegant way to solve the same problem. The init-container will be shut down before the traefik container starts, but the networking is set up, so that traefiks default route will be out through the macvlan and 10.0.0.0/8 (all docker internal networking stuff ends up there) is added to where the default route was previously going.

For good measure I add 100 sequential IPv6 addresses to use in A records externally, but that’s so far only eye candy – they all respond to all hostnames anyhow.

The traefik container have the web and websecure entrypoints specified, and I connected my wordpress ingressroute on the websecure entrypoint. As has been the norm for years, I want my http calls for vegard.blog.engen.priv.no to redirect to https. For that, I’ll use a Traefik Middleware

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: redirect-to-https
namespace: traefik
spec:
redirectScheme:
scheme: https
permanent: true

I can use this in a second ingressroute for my wordpress:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: wordpress-ingressroute-http
  namespace: traefik
spec:
  entryPoints:
    - web
  routes:
    - match: Host(`vegard.blog.engen.priv.no`)
      kind: Rule
      middlewares:
        - name: redirect-to-https
      services:
        - name: noop
          kind: TraefikService

There’s likely more elegant ways to do this, but this works.

There’s also setup there for DNS challenges, through my DNS provider, which is Linode. That functionality comes out of the box, I just need to provide it a token in a secret. The certificates will of course have to be persisted, in the volume mounted at /data/, in the file /data/acme.json

The external part of this is of course DNS, port forwarding on your external router for IPv6 and opening up for the IPv6 addresses, which is already in a public range.

More services

My other web services I am running will be pretty similar. I configure the functionality in their own namespace, and then I create ingressroutes for traefik to handle the rest of it. As long as you configure your services correct and your ingress routes point to the services correctly, magic just seems to happen, and a new site appears, ready to get traffic from the start once you point the DNS names to it.

Migration from and coexistence with my docker setup

Now, since I actually already have services running in docker, and since I only have one external ipv4 address, I have a problem. Where should port 80 and 443 go?

I let them go to port 80 and 443, and handle the ipv4 traffic via the docker nginx for the transition period. In nginx, I just changed out the backends services and sent it in through traefik instead of into the previous docker stack. Elegant? Maybe not. It worked better with ipv6, there they could point directly to my Kubernetes traefik setup, while your non-migrated workload could point to the docker nginx.

Conclusion? Summary?

This was fun. Before getting around to writing this blog post, I had played with a ton of Kubernetes features, some not mentioned yet. I have also migrated the rest of the services from docker so I could point the firewall port forward to traefik instead of docker.

Is it any better than before? Not all that much. I am essentially doing the same. Some things are more elegant, and there’s more features in Kubernetes that will allow me to approve part of this, perhaps by adding redundancy here and there.

I really like the persistent volumes. I use ZFS as storage, and there’s all these neat features that makes Kubernetes provision them automagically by configuring it in Kubernetes (more on that later).

I also like the Traefik setup and the ingressroutes that just makes things appear. I am pretty sure it’d even be possible to create automatic hooks here and there to have DNS records appear in my external DNS provider, via the API.

It’s more complex than my Docker setup, but I can easily extend it more and make it more resilient. And I’m a geek. I love to tinker. That’s a reason all by itself.

Vegards Blog

Kubernetes at home for fun and absolutely no profit.

Kubernetes Implementation

Networking

Namespaces

Pods

Services

Traefik

More services

Migration from and coexistence with my docker setup

Conclusion? Summary?

Legg igjen en kommentar Avbryt svar

Kubernetes at home for fun and absolutely no profit.

Kubernetes Implementation

Networking

Namespaces

Pods

Services

Traefik

More services

Migration from and coexistence with my docker setup

Conclusion? Summary?

Del dette:

Legg igjen en kommentar Avbryt svar