Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running docker containers in existing network namespace (netns) #47828

Open
zackherbert opened this issue May 13, 2024 · 5 comments
Open

Running docker containers in existing network namespace (netns) #47828

zackherbert opened this issue May 13, 2024 · 5 comments
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage

Comments

@zackherbert
Copy link

Description

It should be possible and easy (and documented) to run docker containers in an existing network namespace (netns) on Linux.

There were some previous attempts made to solve this previously, which were rejected without providing an adequate alternative for the users.

Use case:

  • user wants to use an existing complex project using docker containers (with docker-compose) behind a vpn
  • to avoid a vulnerability, user made a new netns and ran openvpn in it, using openvpn-netns for example
  • user wants that the containers are using his netns to connect to the internet
  • user should NOT need to modify any Dockerfile to do this, as this project is constantly evolving, complex, and out of his control

Simplest example:

Let's say we already have a vpn configured inside the vpn0 netns we created.

Running curl ip.me prints isp-ip-address
Running sudo ip netns exec vpn0 curl ip.me prints vpn-ip-address

To test if a container is behind the vpn, we can use this simple Dockerfile:

FROM alpine:latest
RUN apk --no-cache add curl
CMD curl -s ip.me

Building the container docker build -t ip-fetcher-simple,
then running the container docker run --rm ip-fetcher-simple, will print isp-ip-address

If we try to run the container in the vpn0 netns, with sudo ip netns exec vpn0 docker run --rm ip-fetcher-simple, it still prints isp-ip-address
That's because it's the dockerd daemon which is responsible to create the network bridge and container network namespace and it still use the "default" one because it was started on that "default" netns.

Alternative 1 - select netns inside container

One way to solve this is to select the vpn0 netns INSIDE the container.

Our Dockerfile looks like this now:

FROM alpine:latest

RUN apk --no-cache add curl
RUN apk --no-cache add iproute2

RUN mkdir -p /var/run/netns

CMD ip netns exec vpn0 curl -s ip.me

Building docker build -t ip-fetcher,
Running: docker run --privileged --rm --volume /var/run/netns:/var/run/netns:ro --volume /etc/netns/vpn0/resolv.conf:/etc/resolv.conf:ro ip-fetcher
And now it prints the vpn-ip-address correctly.

CONS:

  • you need to use --privileged
  • you need to modify the Dockerfiles

Alternative 2 - Setting dockerd in the vpn0 netns

It is apparently possible if you're using systemd to set the dockerd daemon in your created netns.

Stop docker:

systemctl stop docker.socket
systemctl stop docker.service

Edit docker.service using systemctl edit docker.service, to ensure that updates will not overwrite your changes, with:

NetworkNamespacePath=/run/netns/vpn0
BindReadOnlyPaths=/etc/netns/vpn0/resolv.conf:/etc/resolv.conf
PrivateMounts=no

# overwriting ExecStart did not work for me:
#ExecStart=
#ExecStart=/usr/bin/dockerd -H fd://

Then reload systemd services and restart docker:

systemctl daemon-reload
systemctl restart docker

In theory, running the simple-ip-fetcher container above should return vpn-ip-address

In my case though (On Ubuntu 22.04 with systemd 249 (249.11-0ubuntu3.12)), it was not working.
Building the container works but trying to run my container will generate the following error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown

And I would find in the system logs errors like this:

containerd failed to read init pid file
containerd: level=error msg="copy shim log" error="read /proc/self/fd/11: file already closed"
dockerd: level=error msg="stream copy error: reading from a closed fifo"

The StackOverflow answer points to a possible misconfiguration between dockerd and containerd, but the /var/run/docker/containerd/containerd.toml does not exist on my system, so I'm stuck.

CONS:

  • all containers should now be using the bridge connected to the vpn
  • was not working for me

Alternative 3: create you own custom docker network bridge and try to set its link inside your netns

Creating a custom network and running the container works and gives out the isp-ip-address as expected:

docker network create --opt com.docker.network.bridge.name=vpn0bridge vpn0net
docker run --rm --network vpn0net ip-fetcher-simple

ip link will return something like this:

---redacted-local-interfaces---
7: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:88:16:5f:c1 brd ff:ff:ff:ff:ff:ff
21: vpn0bridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:1d:f8:b7:84 brd ff:ff:ff:ff:ff:ff

Now if I now try to move the vpn0bridge link inside the vpn0 netns, it does not work. I have the following error:

$ sudo ip link set vpn0bridge netns vpn0
RTNETLINK answers: Invalid argument

It seems that it might be possible, with complex iptables magic, to redirect all traffic going through a custom bridge to our vpn gateway, but it seems fickle and potentially still expose the user to the dhcp vulnerability in the use case above.

Alternative 4:

Apparently we can use plugins in github.com/docker/libnetwork.

As far as I know, no documentation exists to show on how to do this.
If this is possible, an easy step by step guide should be added to the docker networking tutorials

Alternative 5:

Once your containers are started, you can find out which netns is used by docker for each one

# Find out pid of container
pid=$(docker inspect -f '{{.State.Pid}}' container_name)

# Add container netns to /var/run/netns so it is detected by ip netns
sudo mkdir -p /var/run/netns
sudo ln -sf /proc/$pid/ns/net "/var/run/netns/container_name"

And then you can use nsenter or ip netns exec, create veth pairs and move one of them in the vpn0 netns, modifying the routing tables as needed.
This alternative is not acceptable as the containers are already started before we modify the routing.

Proposition

All this is way too complicated / not working for a feature which should have been already supported by docker ages ago to allow user to better compartmentalize and secure container networks on Linux.

A possible elegant way to solve this would be to add a com.docker.network.bridge.netns option to the bridge driver options

This would allow the user to create a new docker network with docker network create --opt com.docker.network.bridge.netns=vpn0 --opt com.docker.network.bridge.name=vpn0bridge vpn0net

or even allow to configure the default bridge with a new netns json key in the daemon.json file.

@zackherbert zackherbert added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage labels May 13, 2024
@larsks
Copy link
Contributor

larsks commented May 13, 2024

If you modify the process in option 3 a bit, I think you can get what you want.

We're going to leave the bridge in the global namespace (which may even be necessary for the bridge to function properly, but don't quote me on that), but that's okay! A bridge is a layer 2 device and doesn't really care about ip addresses. We're going to create the bridge without an ip address, like this:

docker network create vpn -d bridge \
  -o com.docker.network.bridge.inhibit_ipv4=true \
  -o com.docker.network.bridge.name=vpn0bridge

That gets us a new network:

$ docker network ls
NETWORK ID     NAME                    DRIVER    SCOPE
.
.
.
0e02e6182e43   vpn                     bridge    local

And a bridge without an ip address:

$ ip addr show vpn0bridge
26: br-0e02e6182e43: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:b9:4c:f0:9f brd ff:ff:ff:ff:ff:ff

If we start a container on this network, it will have a default route:

$ docker run -it --rm --net=vpn alpine sh
/ # ip route
default via 172.25.0.1 dev eth0
172.25.0.0/16 dev eth0 scope link  src 172.25.0.2

But the route won't go anywhere, because the gateway address hasn't been assigned to anything yet:

/ # ping -c1 172.25.0.1
PING 172.25.0.1 (172.25.0.1): 56 data bytes

--- 172.25.0.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

This is everything we need. Create a new namespace running openvpn, then create a veth pair. Attach one end to the docker network bridge and the other to the openvpn namespace, and ensure that the openvpn namespace has the address of the vpn network default gateway:

# create namespace
ip netns add openvpn

# create a veth pair with one end inside the namespace
ip link add openvpn-ext type veth peer name openvpn-int netns openvpn

# attach the outside end of the veth pair to the bridge
ip link set master vpn0bridge openvpn-ext

# given the inside end of the veth pair the gateway address
ip -n openvpn addr add 172.25.0.1/16 dev openvpn-int

# and make sure everything is up
ip -n openvpn link set openvpn-int up
ip link set openvpn-ext up

Now our container has a functioning default gateway:

/ # ping -c1 172.25.0.1
PING 172.25.0.1 (172.25.0.1): 56 data bytes
64 bytes from 172.25.0.1: seq=0 ttl=64 time=0.060 ms

--- 172.25.0.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.060/0.060/0.060 ms

Set up the routing table and netfilter rules inside the openvpn namespace as you see fit and you're all set. I think this hits all your requirements:

  • You don't need to muck around behind Docker's back; there's no need to go poking inside the network namespace of containers.
  • You don't need to make any changes to the bridge device itself, either; that is managed by Docker.
  • Your containers won't have network connectivity until the VPN is up and running

Note that you will either need to add some additional networking to the openvpn namespace so that you have outbound connectivity to reach the VPN server, or you will need to create the vpn interface in the global namespace and move it into the openvpn network namespace after the fact (which works with wireguard interfaces; I haven't tested that with openvpn devices).

Here's a visual of what you get:

 ┌─────────────────────────────────────┬──────────────────┐
 │                                     │                  │
 │  docker controls this               │ you control this │
 │                                     │                  │
 │                                     │                  │
 │ ┌─────────────┐                     │                  │
 │ │             │                     │                  │
 │ │ container1  ├───┐                 │                  │
 │ │             │   │                 │                  │
 │ └─────────────┘   │                 │                  │
 │                   │                 │                  │
 │ ┌─────────────┐   │    ┌──────────┐ │      ┌──────┐    │
 │ │             │   │    │          │ │      │      │    │
 │ │ container2  ├───┼────┤  bridge  ├─┤veth ─┤ vpn  │    │
 │ │             │   │    │          │ │      │      │    │
 │ └─────────────┘   │    └──────────┘ │      └──────┘    │
 │                   │                 │                  │
 │ ┌─────────────┐   │                 │                  │
 │ │             │   │                 │                  │
 │ │ container3  ├───┘                 │                  │
 │ │             │                     │                  │
 │ └─────────────┘                     │                  │
 │                                     │                  │
 │                                     │                  │
 └─────────────────────────────────────┴──────────────────┘

PS: I hadn't looked at this in a number of year; I'm actually quite happy with this solution. The ability co create the bridge but not configure an address on it opens up a lot of flexibility.

@zackherbert
Copy link
Author

zackherbert commented May 14, 2024

@larsks Thanks a lot, you pointed me in the right direction.

Two things to add:

  • the docker network gateway/subnet was different for me, but you can get it with docker network inspect.
  • You need to configure masquerading in the netns for it to work correctly.

Debugging

Pinging the ip address of the veth pair worked correctly from inside a container in my custom network, but pinging 8.8.8.8 was not working.
I used sudo ip netns exec vpn0 tcpdump -i any -U -w - | tcpdump -nn -r - to debug, and noticed that the ICMP packets sent to the vpn tun0 interface were using the veth ip address as a source instead of the vpn local address.

Once everything was configured, my routing table on vpn0 looked like this:

$ sudo ip netns exec vpn0 ip route
default via 10.96.0.1 dev tun0 
10.96.0.0/16 dev tun0 proto kernel scope link src 10.96.0.21 
172.20.0.0/16 dev vpn0-int proto kernel scope link src 172.20.0.1

If I run a ping to the internet on this netns, it works correctly.

$ sudo ip netns exec vpn0 ping -c1 8.8.8.8

Running tcpdump will show:
08:49:07.110868 tun0  Out IP 10.96.0.21 > 8.8.8.8: ICMP echo request, id 4367, seq 1, length 64
08:49:07.128420 tun0  In  IP 8.8.8.8 > 10.96.0.21: ICMP echo reply, id 4367, seq 1, length 64

Now, I would like to make a ping to the internet (8.8.8.8) from inside an new docker container using the vpn0net network.

docker run -it --rm --net=vpn0net alpine sh
/ # ping -c1 8.8.8.8

This, however, does not work. I don't receive a response.

Running tcpdump again will show:

08:46:31.337521 vpn0-int In  IP 172.20.0.2 > 8.8.8.8: ICMP echo request, id 10, seq 0, length 64
08:46:31.337540 tun0  Out IP 172.20.0.2 > 8.8.8.8: ICMP echo request, id 10, seq 0, length 64

So I needed to add a masquerading rule in my netns:

sudo ip netns exec vpn0 iptables -t nat -A POSTROUTING -s $vpn0net_subnet -o tun0 -j MASQUERADE

And then the ping works and tcpdump shows:

10:04:46.075072 ?     In  IP 172.20.0.2 > 8.8.8.8: ICMP echo request, id 8, seq 0, length 64
10:04:46.075107 ?     Out IP 10.96.0.21 > 8.8.8.8: ICMP echo request, id 8, seq 0, length 64
10:04:46.093446 ?     In  IP 8.8.8.8 > 10.96.0.21: ICMP echo reply, id 8, seq 0, length 64
10:04:46.093502 ?     Out IP 8.8.8.8 > 172.20.0.2: ICMP echo reply, id 8, seq 0, length 64

Solution

Assuming we already have a netns named vpn0, containing an interface tun0 from which you can already ping the internet,
here is how to create a custom docker network to use this netns as the default gateway:

# Create a docker network bridge named vpn0net, without any ip address configured:
docker network create vpn0net -d bridge \
  -o com.docker.network.bridge.inhibit_ipv4=true \
  -o com.docker.network.bridge.name=vpn0bridge
  
# Then I find out the gateway and subnet defined for this docker network:
export vpn0net_gateway=`docker network inspect vpn0net -f '{{ (index .IPAM.Config 0).Gateway}}'`
export vpn0net_subnet=`docker network inspect vpn0net -f '{{ (index .IPAM.Config 0).Subnet}}'`

echo $vpn0net_gateway
172.20.0.1

echo $vpn0net_subnet
172.20.0.0/16

# Create a veth, setting one end inside the vpn0 netns, and connecting the other end to the docker bridge
ip link add vpn0-ext type veth peer name vpn0-int netns vpn0
ip link set master vpn0bridge vpn0-ext

# Add the gateway ip address to the vpn0-int link
ip -n vpn0 addr add $vpn0net_gateway/16 dev vpn0-int

# Start the veth interfaces
ip -n vpn0 link set vpn0-int up
ip link set vpn0-ext up

# Enable ip forwarding and set up masquerading
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
sudo ip netns exec vpn0 sysctl -w net.ipv4.ip_forward=1
sudo ip netns exec vpn0 iptables -t nat -A POSTROUTING -s $vpn0net_subnet -o tun0 -j MASQUERADE

# Once this is done, you can run any container in the vpn0net docker network and it should work:
$ docker run -it --rm --net=vpn0net alpine sh
/ # ping -c1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=58 time=27.309 ms

# Note that you also need to mount the correct resolv.conf to avoid any dns leaks:
docker run -it --rm --net=vpn0net --volume /etc/netns/vpn0/resolv.conf:/etc/resolv.conf:ro alpine sh

@zackherbert
Copy link
Author

I'm still leaving this issue open as it would be easier, more secure and probably more efficient if docker allowed us to set the bridge directly inside our netns with a bridge network driver option.

@larsks
Copy link
Contributor

larsks commented May 14, 2024

the docker network gateway/subnet was different for me, but you can get it with docker network inspect.

Right, that's expected. If it make things easier you can set a static network range using the --subnet and --ip-range options to docker network create.

You need to configure masquerading in the netns for it to work correctly.

That would be the "Set up the routing table and netfilter rules inside the openvpn namespace as you see fit" step that I mentioned :).

@stormshield-gt
Copy link

stormshield-gt commented May 14, 2024

@zackherbert would you mind also sharing the integration of you solution inside docker compose? It would be very valuable to me.

Edit: experimenting by my side, I end up launching a script that creates the network namespace and the docker network and then make docker compose up. I've declared the network as external inside the compose file. Please let me know if you have a better solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage
Projects
None yet
Development

No branches or pull requests

3 participants