Using LXD and cgroups

Posted by Kyle Olivo on August 21, 2016

Let’s have a little fun with LXC containers and cgroups. We’ll create a container and then constrain the number of processes that container can spawn so a rogue user can’t create a fork bomb inside of it!

First, let’s create a vagrant box to serve as our playground for this activity. If you aren’t familiar with Vagrant, take a look at their documentation to learn how to install it.

$> vagrant init bento/ubuntu-16.04
$> vagrant up
$> vagrant ssh

Now let’s install the LXD package.

$> sudo apt-get install lxd

If you read the previous blog post you know that LXC consists of a set of tools that allows you to manage containers on a Linux system. LXD is a daemon that allows management of LXC containers through a REST interface either over the network or via a redesigned ‘lxc’ command line experience (which actually also communicates with the daemon via the REST API).

Next we’ll need to add ourselves to the ‘lxd’ group and initialize lxd.

vagrant@vagrant:~$ sudo usermod -a -G lxd vagrant
vagrant@vagrant:~$ exit
$> vagrant ssh
vagrant@vagrant:~$ groups
vagrant adm cdrom sudo dip plugdev lpadmin sambashare lxd
vagrant@vagrant:~$ sudo lxd init
Name of the storage backend to use (dir or zfs): dir
Would you like LXD to be available over the network (yes/no)? no
Do you want to configure the LXD bridge (yes/no)? yes
... # note: just hit enter on the questions about configuring the network
LXD has been successfully configured.

If you just type the ‘lxc’ command, you’ll see a list of subcommands.

vagrant@vagrant:~$ lxc
Usage: lxc [subcommand] [options]
Available commands:
       	config     - Manage configuration.
       	copy       - Copy containers within or in between lxd instances.
       	delete     - Delete containers or container snapshots.
       	exec       - Execute the specified command in a container.
       	file       - Manage files on a container.
       	help       - Presents details on how to use LXD.
       	image      - Manipulate container images.
       	info       - List information on LXD servers and containers.
       	launch     - Launch a container from a particular image.
       	list       - Lists the available resources.
       	move       - Move containers within or in between lxd instances.
       	profile    - Manage configuration profiles.
       	publish    - Publish containers as images.
       	remote     - Manage remote LXD servers.
       	restart    - Changes state of one or more containers to restart.
       	restore    - Set the current state of a resource back to a snapshot.
       	snapshot   - Create a read-only snapshot of a container.
       	start      - Changes state of one or more containers to start.
       	stop       - Changes state of one or more containers to stop.
       	version    - Prints the version number of this client tool.

  --all              Print less common commands.
  --debug            Print debug information.
  --verbose          Print verbose information.

  LXD_CONF           Path to an alternate client configuration directory.
  LXD_DIR            Path to an alternate server directory.

Many of the commands are quite intuitive. The most important subcommands for now are ‘launch’, ‘start’, ‘stop’, and ‘delete’. Let’s try to launch a container and connect to it.

vagrant@vagrant:~$ lxc launch ubuntu:
Creating irreplaceable-madie
Retrieving image: 100%
Starting irreplaceable-madie
vagrant@vagrant:~$ lxc list
|        NAME         |  STATE  |         IPV4          |                     IPV6                      |    TYPE    | SNAPSHOTS |
| irreplaceable-madie | RUNNING | (eth0) | fd63:a75e:9d73:78ff:216:3eff:fe94:b32c (eth0) | PERSISTENT | 0         |

This command indicates that we want to launch an ubuntu container, and since the name after the colon was omitted, we’ll be given the latest version of the ubuntu container. The container’s name is generated and will be different for every container. This one happens to be called ‘irreplaceable-madie’. Now let’s look at some information about the container.

vagrant@vagrant:~$ lxc info irreplaceable-madie
Name: irreplaceable-madie
Architecture: x86_64
Created: 2016/08/22 04:59 UTC
Status: Running
Type: persistent
Profiles: default
Pid: 3852
  eth0:	inet 	vethEKODR3
  eth0:	inet6  	fd63:a75e:9d73:78ff:216:3eff:fe94:b32c 	vethEKODR3
  eth0:	inet6  	fe80::216:3eff:fe94:b32c       	vethEKODR3
  lo:  	inet
  lo:  	inet6  	::1
  Processes: 24
  Memory usage:
    Memory (current): 59.80MB
    Memory (peak): 166.42MB
  Network usage:
      Bytes received: 2.12kB
      Bytes sent: 1.44kB
      Packets received: 20
      Packets sent: 12
      Bytes received: 264 bytes
      Bytes sent: 264 bytes
      Packets received: 4
      Packets sent: 4

Two items are worth noting here. First, we have the proceess id (pid) of the container (3852). Second, we also get a count of the number of processes currently running in the container (24). What we’re going to do is place an upper limit on the number of processes that can run, because we want to be able to prevent a fork bomb from running on our system. Fork-exec is the mechanism used in Linux to create new proceses, by chaining an unending chain of fork-execs together, we can deny service to anyone that might actually want to use our container. We won’t prove that the fork bomb works, because that would likely crash your container and even your vagrant box (because the container has unconstrained access to the resources on the vagrant host it is running on). Let’s place a limit on the number of processes so that this is no longer possible.

Before we set this limit, let’s explore cgroups (control groups) a little. Cgroups are implemented as data structures within the Linux kernel, so we need a way to manipulate something that is running in the kernel. Linux provides a few mechanisms for reading/writing to the kernel. The first is the read-only interface of the /proc filesystem. If we ‘ls’ that directory just like any other filesystem, we see a list of files that are actually live representations of data from the kernel. For instance, let’s look at some information about our memory usage.

vagrant@vagrant:~$ cat /proc/meminfo
MemTotal:         500192 kB
MemFree:          244388 kB
MemAvailable:     410848 kB
Buffers:           17284 kB

By simply issuing the ‘cat’ command on any of these files, we can see what the kernel’s current understanding of the system is. Interestingly, there is a cgroups file in /proc as well.

vagrant@vagrant:~$ cat /proc/cgroups
#subsys_name   	hierarchy      	num_cgroups    	enabled
cpuset 	9      	3      	1
cpu    	4      	136    	1
cpuacct	4      	136    	1
blkio  	7      	136    	1
memory 	8      	191    	1
devices	10     	136    	1
freezer	3      	3      	1
net_cls	5      	3      	1
perf_event     	11     	3      	1
net_prio       	5      	3      	1
hugetlb	6      	3      	1
pids   	2      	137    	1

It’s telling us what subsystems (also called controllers) are defined in the kernel. A subsystem is just a grouping of resources for accounting and control purposes. For instance, the ‘cpuset’ subsystem above controls individual CPU cores in the system, while ‘cpu’ controls cpu cycles, and ‘pids’ controls process information. The other information here isn’t critical for what we’re trying to accomplish, but this document is the best resource I’ve found for a really in-depth, yet digestible explanation of what cgroups are. It also provides descriptions of each of the subsystems listed above. With the above information from /proc/cgroups, we now know there is a subsystem called ‘pids’ which contains the information we need. We can combine this with the pid of our container (shown to be 3852 earlier) to gain more knowledge about the cgroups for our process.

vagrant@vagrant:~$ cat /proc/3852/cgroup

The /proc filesystem provides information about every process running in the system, so by simply looking in /proc/[YOUR_PID] we can learn a lot about the environment surrounding our process. But we only care about the cgroup information for now. By issuing ‘cat’ on this file, we see information in this format, id:subsystems:path. Our goal again is to control the number of processes, so let’s focus on the ‘pids’ line. Just as there is a read path (/proc) for kernel information, there is another filesystem path for manipulating data in the kernel (/sys). If you issue a ‘mount’ command, you can see the subsystems mounted on the filesystem.

vagrant@vagrant:~$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

Great! We can see where the pids subsystem is mounted on the filesystem, and which options were used to mount it. If we dive into that directory, we can find the file that manipulates the maximum number of processes available in our container. We just need to combine the subsystem mount point with the pids path we found for our specific process above (which was /lxc/irreplaceable-madie/init.scope). There is one ‘gotcha’ though, we need to disregard the ‘init.scope’ subdirectory and apply our cgroup change at the level of the container name itself.

vagrant@vagrant:~$ cat /sys/fs/cgroup/pids/lxc/irreplaceable-madie/pids.max
vagrant@vagrant:~$ sudo su -c 'echo "1000" >> /sys/fs/cgroup/pids/lxc/irreplaceable-madie/pids.max'
vagrant@vagrant:~$ cat /sys/fs/cgroup/pids/lxc/irreplaceable-madie/pids.max

We saw that the previous value was ‘max’ and now we’ve reduced that to 1000. This should stop our rogue user from fork bombing our container!

vagrant@vagrant:~$ lxc info irreplaceable-madie | grep Processes
  Processes: 26
vagrant@vagrant:~$ lxc exec irreplaceable-madie -- perl -e 'fork while fork' \&
vagrant@vagrant:~$ lxc info irreplaceable-madie | grep Processes
  Processes: 26

It looks like nothing happened, and that’s great. The processes spawned out of control until they hit our limit, and then they were were terminated. At this point you may object to this test, after all, I never proved that the fork bomb worked when the limit was set to max. If you are feeling adventurous, you can try that now, but will likely lose control of the container and vagrant box and have to restart them (possible even through the virtualbox interface). Before we make that change, I have a confession. There is a much easier way to control cgroup limits than by editing the /sys/fs filesystem directly (thankfully).

vagrant@vagrant:~$ cat /sys/fs/cgroup/pids/lxc/irreplaceable-madie/pids.max
vagrant@vagrant:~$ lxc config set irreplaceable-madie limits.processes 100
vagrant@vagrant:~$ cat /sys/fs/cgroup/pids/lxc/irreplaceable-madie/pids.max
vagrant@vagrant:~$ lxc config unset irreplaceable-madie limits.processes
vagrant@vagrant:~$ cat /sys/fs/cgroup/pids/lxc/irreplaceable-madie/pids.max

How did I know that I should provide ‘limits.processes’ as the value to the get and set subcommands? Well, by reading the documentation of course! So the ‘config’ subcommand to ‘lxc’ provides a nice interface to the cgroup filesystem, and with this we were able to change and remove the original process limit that we had placed on the container. At this point you could try the fork bomb again in the container if you really want to see what happens without limits. Warning though, it isn’t pretty! By the way, if you want to drop down into the container, you can do so like this.

vagrant@vagrant:~$ lxc exec irreplaceable-madie bash

Now you are free to issue any command you want within the confines of the container you created.

So just to recap, we were able to launch an LXC container and set a limit on the number of processes that could be spawned inside of it using cgroups. We gained some additional insight into what the kernel is doing my looking at the /proc filesytem, and saw that we could manipulate the kernel in realtime using the /sys filesystem. Now I encourage you to spend some time trying to place other limits on the container. For instance, are you able to constrain the amount of memory or cpu available to it? And what are all those other lxc subcommands about? Go and explore!