LxcSecurity

Differences between revisions 11 and 12
Revision 11 as of 2012-02-21 20:59:23
Size: 3668
Editor: serge-hallyn
Comment:
Revision 12 as of 2012-11-26 19:34:51
Size: 4133
Editor: serge-hallyn
Comment:
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
This information is aimed at the 12.04 (precise) release. == Considerations for 13.10 ==
Line 11: Line 11:
== Issues considered for 12.04 ==   * unprivileged creation and launch of containers further protects the host.
Line 13: Line 13:
Below, top level items are security concerns, and deeper nesting is potential or actual mitigations. == Considerations for 13.04 ==
Line 15: Line 15:
  * for 12.04, /proc and /sys will be mountable but only under /proc and /sys   * more advanced seccomp integration (hopefully)
  * apparmor policy stacking allows containers to use apparmor themselves even while they apparmor-confined by the host.
  * user namespaces will make root in a container a normal user on the host.
    * Host-owned files like /proc/sysrq-trigger will be owned by host root, so container root will not have ownership.

== Changes for 12.10 ==

  * rudamentary seccomp integration allows some protection of host from container abuse of untrusted system calls.
  * apparmor policy customization was made easier.

== Considerations for 12.04 ==

Below are the security considerations which were taken into account during 12.04 development, and how they were addressed:

  * for 12.04, /proc and /sys are by default only mountable under /proc and /sys
Line 17: Line 31:
  * specific proc files like /proc/sysrq-trigger (should the harmful ones be listed in a wiki page?)
    * User namespaces will make those files owned by "another root" than the container root. They are unlikely to be complete for 12.04 however.
  * apparmor denies use by container of specific proc files like /proc/sysrq-trigger
Line 23: Line 36:
   * reboot/shutdown system call
      * A patch is being sent upstream to make reboot in a container be specially handled. This will prevent this problem.
      * The alternative is to remove CAP_SYS_BOOT from the container's capability bounding set.
  * reboot/shutdown system call
    * Container reboot is now specially handled in the kernel.
    * The alternative (removing CAP_SYS_BOOT from the container's capability bounding set) is a fallback.
Line 27: Line 40:
    * a guest can remount the host's devpts. Simply doing 'mount -t devpts devpts /dev/pts' will overlay its private (newinstance) devpts instance with the host's ('global') instance.  This needs to be handled either with LSM or user namespaces.     * a guest can remount the host's devpts. Simply doing 'mount -t devpts devpts /dev/pts' will overlay its private (newinstance) devpts instance with the host's ('global') instance.
Line 36: Line 49:
    * After 12.04, containers will be in a new namespace where they can define their own (sub-)policies, fully restricted by the main policy     * After 13.04, apparmor policy stacking will remove this restriction
Line 38: Line 51:
      * will "deny remount /" be possible?       * Denied by both apparmor policy and by a lxc patch which keeps a file in the container's '/' pinned for write.
Line 40: Line 53:
    * Denied by apparmor denials of writes under /sys.

Security issues and mitigations with lxc

Introduction

Lxc creates lightweight 'containers' mainly using kernel support for namespaces and control groups. The namespaces can provide isolation (by not providing any name by which to reference a particular file, for instance), control groups can provide various limits (for instance refusal to access /dev/sda), and LSMs can clamp down on permissions with a mandatory access control policy. POSIX capabilities, in particular the bounding set, can be used to refuse some privileges, however this is less than ideal because most privileges are desirable when targeted to resources owned by the container. Finally, seccomp2 can refuse the container access to some kernel functionality (system calls).

However, containers will always (by design) share the same kernel as the host. Therefore, any vulnerabilities in the kernel interface, unless the container is forbidden the use of that interface (i.e. using seccomp2) can be exploited by the container to harm the host.

Considerations for 13.10

  • unprivileged creation and launch of containers further protects the host.

Considerations for 13.04

  • more advanced seccomp integration (hopefully)
  • apparmor policy stacking allows containers to use apparmor themselves even while they apparmor-confined by the host.
  • user namespaces will make root in a container a normal user on the host.
    • Host-owned files like /proc/sysrq-trigger will be owned by host root, so container root will not have ownership.

Changes for 12.10

  • rudamentary seccomp integration allows some protection of host from container abuse of untrusted system calls.
  • apparmor policy customization was made easier.

Considerations for 12.04

Below are the security considerations which were taken into account during 12.04 development, and how they were addressed:

  • for 12.04, /proc and /sys are by default only mountable under /proc and /sys
    • this is to make pathname-based restrictions on proc and sysfs files useful
  • apparmor denies use by container of specific proc files like /proc/sysrq-trigger
    • Apparmor
      • Can deny access to these files by pathname
      • To prevent the container bypassing this with 'mount --move', apparmor will, for 12.04, have a new rule to enforce mount locations
      • After 12.04, rules will be added to enforce pathnames relative to a particular fstype's mount root (i.e. type=procfs sysrq-trigger)
  • reboot/shutdown system call
    • Container reboot is now specially handled in the kernel.
    • The alternative (removing CAP_SYS_BOOT from the container's capability bounding set) is a fallback.
  • Filesystems:
    • a guest can remount the host's devpts. Simply doing 'mount -t devpts devpts /dev/pts' will overlay its private (newinstance) devpts instance with the host's ('global') instance.
      • apparmor will deny devpts mounts to the container
        • the new devpts is mounted by lxc-start before init is executed
        • execution of init is what will lead to the profile which refuses devpts mounts.
    • securityfs shouldn't be mountable by a guest
    • debugfs shouldn't be mountable by a guest
    • binfmt_misc shouldn't be mountable by a guest
  • apparmor profile in container
    • For 12.04, containers run in a restricted profile
    • After 13.04, apparmor policy stacking will remove this restriction
  • container can do super block level remounting of /, which is usually shared with host and other containers.
    • Denied by both apparmor policy and by a lxc patch which keeps a file in the container's '/' pinned for write.
  • a container can do 'udevadm trigger action=add' cause a udev storm - and device resets - on the host.
    • Denied by apparmor denials of writes under /sys.

References

LxcSecurity (last edited 2012-11-26 19:34:51 by serge-hallyn)