Docker Networking Part 2 – what happens in docker stays in docker.

After having created my docker DMZ in part 1, I realized that if I just connected the networks of the docker-containers I wanted to access from the internet to the firewall container, I could avoid exposing their ports to the underlying machine altogether, thereby reducing the number of open ports on the server itself. I will use this blogs network as an example, but in reality I did exactly the same for a number of externally and internally exposed services. Also, since I have already talked about ipv6 addressing, I have decided to leave out the ipv6 part of this configuration. While the ipv6 addresses is there, so far my nginx proxy will talk to the services with the ipv4 address, so in reality the ipv6 networking part of it sits largely unused in the backend part of my setup.

I started off by explicitly creating my service networks like this:

networks:
  wordpress_default:

Then I’d specify them as the network for the services itself:

services:
  wordpress_vegard:
    image: wordpress:latest
    restart: always
    links:
     - mariadb_vegard:mysql
    .....
    volumes:
    .....
    networks:
      wordpress_default:

And in my infrastructure docker-compose, which hosts my firewall docker container, I would specify the network as external:

networks:
  wordpress_wordpress_default:
    external: true

Note the extra wordpress. That’s because that is what my docker-compose setup that lives my wordpress catalog ends to be called in docker.

Then I can connect them to the firewall docker container the same way that I do for the wordpress container itself, by specifying wordpress_wordpress_default as a network for the container.

My firewall scripts would now have to do

WORDPRESS=`ping -c 1 wordpress_wordpress_vegard_1.wordpress_wordpress_default | head -1 | awk '{print $3}'|tr -d '\(\)'`

WORDPRESS_SOURCE=`ping -c 1 infrastructure_firewall_1.wordpress_wordpress_default | head -1 | awk '{print $3}'|tr -d '\(\)'`

iptables -t nat -I PREROUTING -d infrastructure_firewall_1.infrastructure_dmz_firewall -m tcp -p tcp --dport 1081 -j DNAT --to-destination $WORDPRESS:80

iptables -t nat -I POSTROUTING -d $WORDPRESS -m tcp -p tcp --dport 80 -j SNAT --to-source $WORDPRESS_SOURCE

iptables -I FORWARD -s nginx -d $WORDPRESS -m tcp -p tcp --dport 80 -j ACCEPT

Notice that I use ping and not a DNS lookup tool, but there’s no more magic than my firewall image container ping and not host or nslookup, so I just used what I have instead of installing extra things in my firewall.

Since nginx already was pointing to port 1081 on my firewall, this is all I needed to get it to work – except the traffic would now not go outside my docker networks, and I could remove this part from my wordpress configuration:

ports:
   - "1081:80"

But I really dislike NAT, and even though it is sometimes claimed to be a security feature, it more often than not also hides details from you, opening up for human error instead. So I dived into another rabbit hole – how can I use the real ip addresses in the nginx configuration, and have it routed to my firewall.

It was around this time, after a lot of taking up/down docker stacks, that I had my first case of a network deciding to choose a different subnet, and I realized that in docker, this is bound to happen if you play around with things enough. And being nerds, this is what we do, isn’t it?

Docker just magically selecting networks is neat. It works, as long as you stay inside the network. But I was already having internal docker networks that were overlapping with some of the networks I was running on my physical network, so I bit the bullet and decided to hard code the networks. For our example application, my wordpress, this is how I did it:

networks:
  wordpress_default:
    ipam:
      config:
        - subnet: "10.100.5.0/24"
          ip_range: "10.100.5.0/24"
          gateway: "10.100.5.1

But not only can networks move around, it’s not guaranteed that a hosts IP address on the network remain the same, so I hardcoded that too:

services:
  wordpress_vegard:
     ....
     networks:
       wordpress_default:
         ipv4_address: 10.100.5.11

I don’t care what ip address my firewall docker container gets on the network, it’s not going to change until I restart the container, and when the container is restarted, the firewall config will be applied again.

Then came the fun – how to get routing to work? It turns out there’s no real good way to create a routing entry via docker-compose – it would have to be done from within the docker container itself. Also, there was no easy way to let the firewall be the default route of the network – once I bring up the dmz_firewall network, the bridge interface just magically takes whatever ip address I have specified as a default gateway for itself, thus I can’t assign that ip address to the firewall cotainer.

So, I had to bite the bullet and have my nginx container change the routing as part of startup scripts. This means I had to give it a few extra permissions, because ip addresses and routing tables can normally not be done inside a container, even if the user inside the container is root. To achieve this, you need to do:

service:
  nginx:
    ....
    cap_add:
      - NET_ADMIN

This was one of the extra permissions I had to allow the firewall to have, but that is a bit more acceptable than having my nginx have it. But I bit this bullet too.

Then, there was another rabbit hole to dive into: How to make the nginx-proxy-manager container run extra scripts on startup. I didn’t want to create my own image but rather use the standard, and having my additions specified via docker-compose. The nginx-proxy-manager uses the s6-overlay mechanism to run its startup scripts – and here how I ended up doing it:

In my infrastructure catalog on the docker host, I created a subdirectory nginx to hold the things I want to inject into the docker container:

vegard@server:~/docker/infrastructure$ ls -lR nginx
nginx:
total 1
drwxr-xr-x 2 vegard vegard 5 Jan 22 07:48 route
drwxr-xr-x 2 vegard vegard 3 Jan 22 07:53 service

nginx/route:
total 14
-rwxr-xr-x 1 root   root   101 Jan 22 07:25 route.sh
-rw-r--r-- 1 vegard vegard   8 Jan 22 07:46 type
-rwxr-xr-x 1 vegard vegard  63 Jan 22 07:47 up

nginx/service:
total 1
-rw-r--r-- 1 vegard vegard 0 Jan 22 07:53 route

vegard@server:~/docker/infrastructure$ cat nginx/route/type
oneshot
vegard@server:~/docker/infrastructure$ cat nginx/route/route.sh 
#!/bin/sh
apt -y update
apt install net-tools
route add -net 10.100.0.0/16 gw firewall   # portainer
vegard@server:~/docker/infrastructure$ cat nginx/route/up       
# shellcheck shell=bash
/etc/s6-overlay/s6-rc.d/route/route.sh

This, I mount into nginx like this:

    volumes:
      - nginx_data:/data
      - nginx_letsencrypt:/etc/letsencrypt
      - ./nginx/route:/etc/s6-overlay/s6-rc.d/route:ro
      - ./nginx/service/route:/etc/s6-overlay/s6-rc.d/user/contents.d/route:ro

Explanation: Scripts are put into separate directories under /etc/s6-overlay/s6-rc.d. I named my directory route, since I am modifying routing. In this directory there exists a type file that has to have the word oneshot as the only content. THen, there is a file up that holds the reference to the script you want to run. If the script is longer than one command, it’s best to do it like that. Then, my route.sh needed to install the route tool (in the package net-tools) before I can finally route 10.0.0.0/16 to the firewall. To have this oneshot being run boot-time, there needs to be a file named the same as the directory name in /etc/s6-overlay/s6-rc.d/user/contents.d/, so there you can just put an empty file.

Rebuilding the stack, 10.100.0.0/16 – which contains all my inside docker networks I need to route to from the DMZ – is now routed true the firewall.

I can now specify 10.100.5.11:80 (the ip address I assigned to my wordpress instance container) as the backend for vegard.blog.engen.priv.no, and it will be nicely routed – without NAT!

Oh, not so fast! That didn’t work. The wordpress container of course does not know where to route the return traffic. Rather than going around injecting return routes in all inside docker containers, I grudgingly accept doing a source NAT on to the firewalls ip address in the network. Since all 10.100.0.0/16 networks on the firewall should be handled the same way, I just created a small addition to my firewall script:

for i in `ip address list | grep 10.100 | grep -v 10.100.1 | awk '{print $7}'`; do
  ip=`ip address list $i |grep inet | awk '{print $2}' | cut -f 1 -d'/'`
  iptables -t nat -I POSTROUTING -o $i -j SNAT --to-source $ip
done

And then I can get rid of all my other NAT rules for the inside container networks on the firewall!

I am, however, left with NAT for my SSH bastion host that I mentioned in part one. Since I don’t particularly want to let my bastion host, which I am going to ssh into, have more tools than absolutely necessary, and also no extra rights, I can not modify the routing there, and am left with the only option for now: To let it ssh to the firewall interface ip address itself, and have the firewall handle it with NAT.

And I was satisfied with myself for a while, until I remembered the default gateway interfaces on the networks – which happens to be the ip-adress on the bridge interface to the underlying network. Through this interface, you can reach every port on the underlying host that listens on all ip addresses (0.0.0.0) – including SSH. Now, I could of course go and change the configuration of the underlying host services – or perhaps create iptables rules on the docker host. And that would likely have been easier. But being on a nerdy quest to see what I can do with docker networking, I decided to see what could be done to harden the docker setup itself. But that ended up to be large enough to be called part three, which I’ll write soon! Stay tuned!

Vegards Blog

Docker Networking Part 2 – what happens in docker stays in docker.

Legg igjen en kommentar Avbryt svar

Docker Networking Part 2 – what happens in docker stays in docker.

Del dette:

Legg igjen en kommentar Avbryt svar