LxcSecurity

Revision 6 as of 2012-01-04 19:55:18

Clear message

Security issues and mitigations with lxc

Introduction

Lxc creates lightweight 'containers' mainly using kernel support for namespaces and control groups. The namespaces can provide isolation (by not providing any name by which to reference a particular file, for instance), control groups can provide various limits (for instance refusal to access /dev/sda), and LSMs can clamp down on permissions with a mandatory access control policy. POSIX capabilities, in particular the bounding set, can be used to refuse some privileges, however this is less than ideal because most privileges are desirable when targeted to resources owned by the container. Finally, seccomp2 can refuse the container access to some kernel functionality (system calls).

However, containers will always (by design) share the same kernel as the host. Therefore, any vulnerabilities in the kernel interface, unless the container is forbidden the use of that interface (i.e. using seccomp2) can be exploited by the container to harm the host.

This information is aimed at the 12.04 (precise) release.

Issues considered for 12.04

Below, top level items are security concerns, and deeper nesting is potential or actual mitigations.

  • specific proc files like /proc/sysrq-trigger (should the harmful ones be listed in a wiki page?)
    • User namespaces will make those files owned by "another root" than the container root. They are unlikely to be complete for 12.04 however.
    • Apparmor
      • Can deny access to these files by pathname
      • To prevent the container bypassing this with 'mount --move', apparmor will, for 12.04, have a new rule to enforce mount locations
      • After 12.04, rules will be added to enforce pathnames relative to a particular fstype's mount root (i.e. type=procfs sysrq-trigger)
    • reboot/shutdown system call
      • A patch is being sent upstream to make reboot in a container be specially handled. This will prevent this problem.
      • The alternative is to remove CAP_SYS_BOOT from the container's capability bounding set.
  • Filesystems:
    • a guest can remount the host's devpts. Simply doing 'mount -t devpts devpts /dev/pts' will overlay its private (newinstance) devpts instance with the host's ('global') instance. This needs to be handled either with LSM or user namespaces.
    • securityfs shouldn't be mountable by a guest
    • debugfs shouldn't be mountable by a guest
    • binfmt_misc shouldn't be mountable by a guest
  • apparmor profile in container
    • For 12.04, there will be a choice
      • either
        • container runs in a host-defined apparmor profile
        • or container runs in its own profile
      • but it won't be able to have its own profile and still be confined by host
      • that will be fixed for 14.04

References