Docker Gotchas

Back at Windy City Rails 2016 i decided to give a lightning talk about some Docker gotchas i've found while developing our internal and production applications. Most of these issues are pretty well known to people that deal with Docker a lot, but for a newcomer these gotchas might be not so obvious. Lightning talk was only five minutes long and i didn't really go into all details i wanted to cover, so this blog post is pretty much a follow up.

Overview

Docker and UFW don't play nice
Containers don't have persistent IP address
Port publishing creates docker proxy processes
Out-of-memory errors can knock down your OS
Dangling and untagged images pollute FS

Docker and UFW don't play nice

UFW (uncomplicated firewall) is a pretty simple and powerful tool to manage iptables on Ubuntu OS. However, if you ever used Docker and UFW you might have learned that these two don't play nice together, or at least not with out-of-the-box configuration.

Let's pretend we have a multi-host application: first server runs the load balancer + application and the second one runs a database server. Typically, you would lock down both of your servers and only allow ports from known sources. On our app server it would be 22 (ssh), 80 (http) and 443 (https) and 5432 (postgres) on database server. Example:

# Deny all incoming connections by default
$ ufw default deny incoming 

# Allow all outgoing connections by default
$ ufw default allow outgoing

# Allow specific inbound ports
$ ufw allow 22  # SSH traffic
$ ufw allow 80  # HTTP traffic
$ ufw allow 443 # Optional HTTPS traffic

# Enable firewall
$ ufw enable

On your database box you would restrict access from only our application:

# Allow connections to local PostgreSQL server from 10.0.0.1
$ ufw allow from 10.0.0.1 to any port 5432
$ ufw enable

If you were running PostgreSQL natively, accessing port 5432 from any random host would eventually timeout (or be refused based on default firewall policy). That's not the case with Docker. For example, you'd start your database container with a similar command:

  $ docker run -p 5432:5432 -d postgres:9.5

So, whats the problem? Well, given that you have configured your firewall, the following command should not work, but it does:

  psql -h=my-public-ip -U=myuser mydatabase

And if you use weak passwords or haven't properly configured postgres security settings, anybody on the network can gain access to the postgres server. It's a typical problem for Rackspace Cloud, Digital Ocean, Linode or any other VPS providers, since they allocate IP addresses (usually in 10.x.x.x range) in the shared network space.

One of the main reason why your firewall stops working for any Docker container is pretty simple: when installed, docker daemon alters your OS iptables with its own chain. For example, i created a new droplet on DigitalOcean with vanilla Ubuntu server, then took snapshot of iptables before and after docker install. Here' the quick diff:

diff --git a/before.txt b/after.txt
index 1438dd5..69e85a2 100644
--- a/before.txt
+++ b/after.txt
@@ -9,6 +9,11 @@ ufw-track-input  all  --  anywhere             anywhere

 Chain FORWARD (policy DROP)
 target     prot opt source               destination
+DOCKER-ISOLATION  all  --  anywhere             anywhere
+DOCKER     all  --  anywhere             anywhere
+ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
+ACCEPT     all  --  anywhere             anywhere
+ACCEPT     all  --  anywhere             anywhere
 ufw-before-logging-forward  all  --  anywhere             anywhere
 ufw-before-forward  all  --  anywhere             anywhere
 ufw-after-forward  all  --  anywhere             anywhere
@@ -25,6 +30,13 @@ ufw-after-logging-output  all  --  anywhere             anywhere
 ufw-reject-output  all  --  anywhere             anywhere
 ufw-track-output  all  --  anywhere             anywhere

+Chain DOCKER (1 references)
+target     prot opt source               destination
+
+Chain DOCKER-ISOLATION (1 references)
+target     prot opt source               destination
+RETURN     all  --  anywhere             anywhere
+
 Chain ufw-after-forward (1 references)
 target     prot opt source               destination

In other words, Docker alters iptables in a way that makes UFW useless. There are couple of ways to address the issue, but they are all specific to your application's setup.

Initial step is to disable iptables modification in Docker configuration. Edit /etc/default/docker file:

DOCKER_OPTS="--iptables=false"

Docker restart is required at this point. Configuration change usually works fine unless you have a complicated docker networking setup. In some cases inter-container networking might break.

Another option is to audit your iptables and have a configuration that suits your needs. That can and will probably be time consuming, but overall you'll have systems that play nice with docker rules while maintaining your own.

Finally, do not expose any ports that aren't necessary. If you're running a database container that's only being used by another container locally, don't expose extra ports on your system - it's an overhead and also a security risk. Need to connect directly to the production database from your local machine? Well, use SSH tunnels, they work just fine.

Also, check out github issue to see other solutions.

Containers don't have persistent IP address

When docker starts a container, it allocates an internal IP address for it, which is usually something like 172.17.0.X. Then you decide to have another container that talks to the first one by getting its IP and exposing it as an environment variable (rough example). You can get the IP by inspecting the container:

docker inspect --format '{{ .NetworkSettings.IPAddress }}' id

Then all of a sudden your server is rebooted (crashed, etc) and the app stops working. Why is that? Well, the gotcha here is that container's IP address is not persistent, so you can't really rely on it. Let's have a look at a basic demo:

# Create first container
$ docker run -d --name=myapp ruby ping google.com
a3bfaa3be952cb28b8a033d9121f86205d37966e9dd9e464b89c6c0a8d6e4810

# Inspect ip address
$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' myapp
172.17.0.2

# We're stopping the app
$ docker stop myapp

# Create another container
$ docker run -d --name=myapp2 ruby ping facebook.com
6cc90fc176d9fb2868abd2e998b8830e29a9e6262f81895a48babfd65b77534c

# Start the old container
$ docker start my app

# Check if the IP is the same
$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' myapp
172.17.0.3 # <—- THIS IP CHANGED

# Check another container's IP
$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' myapp2
172.17.0.2

As you can see, when we start 2 containers and then restart them, the IP addresses change. If you use IP address directly, your application will break.

What's the fix? One of the options is to use container linking which will allow you to reference containers by name. Example:

docker run --name=db -d postgres 
docker run --link=db -d myapp

Depending on how you configure the application, you can use container names in your setup, like (DATABASE_URL=postgres://user:pass@db:5432/app). If container that your app depends on gets restarted or goes down, docker will keep track of its IPs. To clarify a bit on how it all works: docker simply manages /etc/hosts file in the container that has any links. See example below:

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  redis 1d624c9df576
172.17.0.3  postgres 4576e8e78c18
172.17.0.4  ea9d62a71c95

When you try to stop/start dependent containers in different orders, docker will update the hosts file without breaking your app.

Links are getting deprecated at this point (still useful though), so you might consider using docker networks. Custom networks give you flexibility and also visibility on how your components are fitting together. Another advantage of using networks is that you can allocate persistent IP address to any container that uses the network (unlike links). Use --ip flag when starting container with docker run.

Port publishing creates docker proxy processes

This one is not really an issue but more like something you should know about how docker networking operates. Say you start a new container that publishes a port:

docker run -d -p 80:5000 myapp

Our app is running on port 80 (TCP), which will be exposed to the internet. And if the app is popular enough you'll notice that there's something else creeping into the top output.

For each container that has any published ports (on host machine) docker starts a TCP/UDP proxy process that proxies the traffic back to your container process. To see this in action, run the following command:

ps aux | grep docker-proxy

Output might look like this:

docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 80 -container-ip 172.17.0.4 -container-port 5000

If your application does a lot of network io you might notice that proxy processes each eat some of CPU (was 40% in one of my cases). In most setups it's not a problem at all (you have other bottlenecks), but keep in might that docker is not just magic, it has its own internal tools to make its magic happen behind the curtains.

Rule of a thumb here: don't expose ports unless you really have to.

Out-of-memory errors can knock down your OS

OOM errors could happen if your OS does not have enough memory to continue operations. Most errors are mitigated by restarting processes, but some could trigger the OS to go into unrecoverable halt that only a reboot could fix. Such errors could be observed easily with Digital Ocean droplets, which by default don't come with any swap.

$ docker run -d my-beefy-rails-app

Swap, in general, is very important to OS and is used as an extra memory space, although very slow, to continue running programs that would fail otherwise. When you start a new container, it does not have any limits on how much memory it could use. In an event of a memory leak in your app, allocated memory will keep growing until it reaches the point of total exhaustion and system will have no other choice but to (ex)terminate it. With "older" linux kernels this could result in system error and thus making your OS unresponsive. This kind of errors is not trivial to reproduce, and usually indicates that's something in your app is misbehaving.

When running docker containers, it's always a good idea to setup some sort of resource constraints along with restart policy:

$ docker run -d \
  —restart=always \
  —memory=512 \
  —memory-swap=0 \
  my-beefy-rails-app

When the app uses too much memory, it'll be automatically killed, and with restart policy in place you'll know the app will be restarted as soon as it dies, regardless of the reason (crash, etc).

Most of the issues related to resource utilization (memory, CPU, network) could be prevented by simply observing your application. That means logging, instrumentation, monitoring and more. Docker comes with the remote API that's pretty useful by itself. You can setup tools like InfluxDB, Grafana, Logstash and others to gather and visualize resource usage and setup proper alerting when something is up.

Docker is moving fast, so make sure to stay up to date with latest linux kernels, and don't forget to enable swap on your OS!

Dangling and untagged images pollute FS

Docker images are not what you think they are. Coming from Vagrant, i noticed that a lot of people think of docker images as of a single filesystem snapshot. But that's not true, images are a collection of filesystem layers. However, that depends on how your docker engine is configured, there are few storage drivers available.

In typical scenario, you would make a few changes to your app and then build a docker image:

docker build -t hello/myapp .

Then you have some sort of automated system that takes those freshly built images and deploys them on to docker nodes. Repeat the cycle 100s of times and you end up with dangling (untracked) images. So, when i referred to docker images as being layers, i meant that each step in your Dockerfile produces a filesystem layer that could be cached and reused by other builds or other totally different images.

On your servers you probably use docker pull mycorp/myimage command to pull latest changes. Slight comment here, that command is actually being interpreted as docker pull mycorp/myimage:latest. Latest stands for whatever the latest image id is available in the docker registry. It's also a tag. When you pull images without a tag you're always referring to latest, which is time subjective. Anyways, when you pull for latest changes and you don't tag your images, you might end up with many untracked, unused image layers:

$ docker images
# or docker images -a

REPOSITORY       TAG             IMAGE ID        CREATED          SIZE
myapp            latest          3cfbce003800    33 hours ago
<none>           <none>          58e12b181489    2 days ago         1.016 GB
<none>           <none>          09c6230a686f    2 days ago         1.024 GB
<none>           <none>          559efd23e19c    2 days ago         1.024 GB
<none>           <none>          ec6f4f18c90c    2 days ago         1.035 GB
<none>           <none>          c50506c9fa32    2 days ago         1.034 GB
<none>           <none>          dd9429b92f28    3 days ago         1.033 GB
<none>           <none>          60534a5aa2b6    3 days ago         1.033 GB
<none>           <none>          46a302aa0da1    3 days ago         1.029 GB
<none>           <none>          3497cd79d8e0    3 days ago         1.029 GB
<none>           <none>          b154ef538cb2    3 days ago         1.029 GB
<none>           <none>          b6a176f9183c    3 days ago         1.027 GB

There are going to be a lot of unused layers laying around, why not just get rid of them periodically? When you try to remove an image, docker will not delete the image unless there's a container (running or stopped) that uses it. You can do that with a simple command:

$ docker images -q | xargs docker rmi

Deleted: sha256:58e12b18148976dda668b1d001745853d4997
Deleted: sha256:fd0161ef5c76870cd7a2afe8cada44de5474594
Deleted: sha256:22b96627b93798445d9af6e53bfbc68fde4df14
Deleted: sha256:03879b4386b3362486fc2fe209433dd7177e16
Deleted: sha256:09c6230a686f907721bc4bbfe4009c10872253
Deleted: sha256:088e3f6d5febe3ef82543345aacb12dd7df1ea2

Error response from daemon: conflict: unable to delete 3cfbce003800
(cannot be forced) - image is being used by running container 9298939fdffd

There's a slight issue with removing unused images. If you plan on doing deployment rollbacks, images that were not used are gone and will be re-fetched upon request. And if you don't tag them there's no way to know which particular image should be used since latest always points to latest image release. Tagging is easy, you can use git tags or even git commits based off your deployment branch.

Conclusion

Docker moves fast. Keep track of changes, follow blogs, etc.
Nothing is magic, it's all just set of tools.
Don't trust defaults.
Investigate your setup, make your systems fail to lean how to address the issues.
Experiment, you'll learn more than you ever did.

If you have any questions or feel like some information on this blog post is wrong - feel free to comment in the box below. Happy hacking! :)