Security issues and mitigations with lxc


Lxc creates lightweight 'containers' mainly using kernel support for namespaces and control groups. The namespaces can provide isolation (by not providing any name by which to reference a particular file, for instance), control groups can provide various limits (for instance refusal to access /dev/sda), and LSMs can clamp down on permissions with a mandatory access control policy. POSIX capabilities, in particular the bounding set, can be used to refuse some privileges, however this is less than ideal because most privileges are desirable when targeted to resources owned by the container. Finally, seccomp2 can refuse the container access to some kernel functionality (system calls).

However, containers will always (by design) share the same kernel as the host. Therefore, any vulnerabilities in the kernel interface, unless the container is forbidden the use of that interface (i.e. using seccomp2) can be exploited by the container to harm the host.

Considerations for 13.10

  • unprivileged creation and launch of containers further protects the host.

Considerations for 13.04

  • more advanced seccomp integration (hopefully)
  • apparmor policy stacking allows containers to use apparmor themselves even while they apparmor-confined by the host.
  • user namespaces will make root in a container a normal user on the host.
    • Host-owned files like /proc/sysrq-trigger will be owned by host root, so container root will not have ownership.

Changes for 12.10

  • rudamentary seccomp integration allows some protection of host from container abuse of untrusted system calls.
  • apparmor policy customization was made easier.

Considerations for 12.04

Below are the security considerations which were taken into account during 12.04 development, and how they were addressed:

  • for 12.04, /proc and /sys are by default only mountable under /proc and /sys
    • this is to make pathname-based restrictions on proc and sysfs files useful
  • apparmor denies use by container of specific proc files like /proc/sysrq-trigger
    • Apparmor
      • Can deny access to these files by pathname
      • To prevent the container bypassing this with 'mount --move', apparmor will, for 12.04, have a new rule to enforce mount locations
      • After 12.04, rules will be added to enforce pathnames relative to a particular fstype's mount root (i.e. type=procfs sysrq-trigger)
  • reboot/shutdown system call
    • Container reboot is now specially handled in the kernel.
    • The alternative (removing CAP_SYS_BOOT from the container's capability bounding set) is a fallback.
  • Filesystems:
    • a guest can remount the host's devpts. Simply doing 'mount -t devpts devpts /dev/pts' will overlay its private (newinstance) devpts instance with the host's ('global') instance.
      • apparmor will deny devpts mounts to the container
        • the new devpts is mounted by lxc-start before init is executed
        • execution of init is what will lead to the profile which refuses devpts mounts.
    • securityfs shouldn't be mountable by a guest
    • debugfs shouldn't be mountable by a guest
    • binfmt_misc shouldn't be mountable by a guest
  • apparmor profile in container
    • For 12.04, containers run in a restricted profile
    • After 13.04, apparmor policy stacking will remove this restriction
  • container can do super block level remounting of /, which is usually shared with host and other containers.
    • Denied by both apparmor policy and by a lxc patch which keeps a file in the container's '/' pinned for write.
  • a container can do 'udevadm trigger action=add' cause a udev storm - and device resets - on the host.
    • Denied by apparmor denials of writes under /sys.


LxcSecurity (last edited 2012-11-26 19:34:51 by serge-hallyn)