Docker networking part four – hacking around docker limitations.

After part 3, my setup was pretty good, and I was pretty sure I had come to the end of the road. There was just one thing that was bugging me: I needed to do NAT (MASQUERADE) in my firewall to get around the fact that docker routing table management is pretty limited. And with pure docker setup it’s correct, I was pretty much at the end of the line. There was no way I could inject a route anywhere. I might have gotten the default route to go back to the firewall, but having the possibility for services to access the frontend too was bound to lead to asynchronous routing. And that’s not something a firewall guy will accept.

Then I stumbled upon docker events, and doing it all outside of docker. Because you can, in fact, reach and modify the internal networking of docker from the outside.

The key concept is listening for docker events. Turns out there is a pretty basic service that someone has made. It will listen for containers starting, and acting upon properties in the form of labels that describe what you want to do at start. I had to extend it with a few labels to have it handle my ipv6 setup, but other than that I was good to go.

After installing the service, which was small enough that you can read through it and verify that it only does what it says, you need to add some configurations to the containers you want it to handle:

 labels:
      docker-events.route: "add 10.0.0.0/8 via 10.100.6.250"
      docker-events.ipv6-route: "add aaaa:bbbb:cccc:ddb0::/60 via aaaa:bbbbb:cccc:dda6::250"

This one routes the networks I have dedicated for the internal networks on the DMZ side of my firewall back to the firewall, so that the return traffic works without NAT. The default route will still be outwards on my dockerdmz interface, to have it route the frontend of the DMZ out towards my main gateway and out again.

I have opted to use real ipv6 addresses from my own address space in the docker-internal networks, but it would have worked equally well to use IPV6s equivalent of the private network ranges, ULA, as these ip addresses should not ever hit the the outside of my network. In my examples, aaaa:bbbb:cccc:dd/56 denotes my ISP-delegated /56 (which incidentally gives me 256 networks – which is enough to play with for a while). For IPV4, I consider myself lucky not being behind CGNAT, so I at least have one public ip address. But in my docker setup, I uses 10.100.0.0/16 for my application networks, and 10.101.0.0/16 for the networks on the internal side of the DMZ. In my local network, I use 192.168.0.0/16,giving me 256 VLANs to play with – of which some extends into docker (the dmz, the dockerdmz and the hassio backend network). For IPV6, I use aaaa:bbbb:cccc:dda0/60 for my application networks, which gives me 16 applications. Enough for now, but I might have to dedicate more ranges. For the DMZ side, I dedicate aaaa:bbbb:cccc:ddb0/60 to the internal networks.

With this setup, the routing tables I need is pretty easy: Everything on the DMZ side of the firewall needs to route 10.100.0.0/16 and aaaa:bbbb:cccc:dda0/60 to the firewall, and on the application side, I need to route 10.101.0.0/16 and aaaa:bbbb:cccc:ddb0/60 to the firewall.

With this setup, I am also able to remove the user side extra permissions that I gave my nginx container, and rather managing it in the same way in my docker setup.

So, how does this really work?

The docker-events listener will listen for container start events:

while true; do
docker events --filter type=container --filter event=start --format '{{.Status}}.{{.Actor.Attributes.name}}' --since=15s | while read name; do
echo Event has been catched: $name

For every container start, it will look for the interesting labels on the docker container, and do the actions specified. Here is the one adding IPV4 routes

        values=$(docker inspect $coname --format '{{ index .Config.Labels "docker-events.route" }}')

        if [ ! -z "$values" ]; then
            coPID=$(docker inspect --format {{.State.Pid}} $coname)

            IFS=$splitter
            for routestr in $values; do
                echo "$coname: Found container assigned route-label and applying it: $routestr"
                cmd="nsenter -n -t $coPID ip route $routestr"
                sh -c "$cmd"
            done
        fi

The magic command is nsenter. It is used to execute the command in the network naming space of the given pid of the docker container (-t coPID), which I found with a proper docker inspect command. The original service I found doesn’t support ipv6, but it was trivial to replicate this for ipv6.

        values=$(docker inspect $coname --format '{{ index .Config.Labels "docker-events.ipv6-route" }}')

        if [ ! -z "$values" ]; then
            coPID=$(docker inspect --format {{.State.Pid}} $coname)

            IFS=$splitter
            for routestr in $values; do
                echo "$coname: Found container assigned route-label and applying it: $routestr"
                cmd="nsenter -n -t $coPID ip -6 route $routestr"
                sh -c "$cmd"
            done
        fi

Now, I am all set for removing NAT in my firewall scripts. After some polishing, it is really neat and easily manageable. Note: I am using the docker resolving mechanism to get the correct ip addresses in the firewall scripts. This might make some rules not being loaded if I have some services down while restarting the firewall, so I might opt to hardcode later. Or even better, to use docker events and notice when the containers are up, and then reload the firewall rules. But for now, I can live with this.

#/bin/bash


function load_ipv4_rules {

# Specific rules
iptables -F IPV4_DOCKERFW_RULES

iptables -I IPV4_DOCKERFW_RULES -s nginx -d homeassistant.homeassistant_hassio -m tcp -p tcp --dport 8123 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d nextcloud-app-1.nextcloud_nextcloud_default -m tcp -p tcp --dport 80 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d plex.pms_plex_default -m tcp -p tcp --dport 32400 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d wordpress-wordpress_vegard-1.wordpress_wordpress_default -m tcp -p tcp --dport 80 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d grafana.grafana_grafana_default -m tcp -p tcp --dport 3000 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d portainer-portainer-1.portainer_portainer_network -m tcp -p tcp --dport 9443 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d deconz.homeassistant_hassio -m tcp -p tcp --dport 80 -j ACCEPT

iptables -I IPV4_DOCKERFW_RULES -s nginx -d prometheus.grafana_grafana_default -m tcp -p tcp --dport 9090 -j ACCEPT

iptables -A IPV4_DOCKERFW_RULES -j RETURN

}



function load_ipv6_rules {

# Specific rules
ip6tables -F IPV6_DOCKERFW_RULES

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d homeassistant.homeassistant_hassio -m tcp -p tcp --dport 8123 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d nextcloud-app-1.nextcloud_nextcloud_default -m tcp -p tcp --dport 80 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d plex.pms_plex_default -m tcp -p tcp --dport 32400 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d wordpress-wordpress_vegard-1.wordpress_wordpress_default -m tcp -p tcp --dport 80 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d grafana.grafana_grafana_default -m tcp -p tcp --dport 3000 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d portainer-portainer-1.portainer_portainer_network -m tcp -p tcp --dport 9443 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d deconz.homeassistant_hassio -m tcp -p tcp --dport 80 -j ACCEPT

ip6tables -I IPV6_DOCKERFW_RULES -s nginx -d prometheus.grafana_grafana_default -m tcp -p tcp --dport 9090 -j ACCEPT

ip6tables -A IPV6_DOCKERFW_RULES -j RETURN

}

function init {
# Allow all traffic out to DOCKERDMZ
dmz_interface=`ip address list | grep 192.168.28 | awk '{print $7}'`
iptables -I FORWARD -o $dmz_interface -j ACCEPT
ip6tables -I FORWARD -o $dmz_interface -j ACCEPT


# Allow all established traffic (needed)
iptables -I FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT

ip6tables -I FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT

iptables -N IPV4_DOCKERFW_RULES
iptables -I FORWARD -j IPV4_DOCKERFW_RULES
iptables -P FORWARD DROP

load_ipv4_rules

ip6tables -N IPV6_DOCKERFW_RULES
ip6tables -I FORWARD -j IPV6_DOCKERFW_RULES
ip6tables -P FORWARD DROP

load_ipv6_rules

}
case "$1" in
        start)
                echo "Loading rules..."
                init
                ;;
        reload) 
                load_ipv4_rules
                load_ipv6_rules
                ;;
esac

exit 0

Upon initalizing the container, I will run the script with init, but later on I can run it in the container with reload in case I have added something, or I have restarted things out of order.

docker exec -it infrastructure-firewall-1 /usr/local/etc/firewall/initfw.sh reload

I might work on a bit of robustification later. Maybe there could be some logic in the script which trapped failed iptables commands, and then reran it after an appropriate period if the last run failed.

While working on this, the thought of migrating to kubernetes has matured on me. It has totally different concepts around networking, it has network policies built in and pretty flexible plugin-based network management.

It might warrant a deep dive.

Vegards Blog

Docker networking part four – hacking around docker limitations.

So, how does this really work?

Legg igjen en kommentar Avbryt svar

Docker networking part four – hacking around docker limitations.

So, how does this really work?

Del dette:

Legg igjen en kommentar Avbryt svar