== Using Linux Containers in Natty -- hallyn == {{{#!IRC [19:01] Logs for this session will be available at http://irclogs.ubuntu.com/2011/03/23/%23ubuntu-classroom.html following the conclusion of the session. [19:02] Ok, hey all [19:03] I'm going to talk about containers on natty. [19:03] In the past, that is, until definately lucid, there were some constraints which made containers more painful to administer - [19:03] i.e. you couldn't safely upgrade udev [19:03] that's now gone! [19:04] but, let me start at the start [19:04] containers, for anyone really new, are a way to run what appear to be different VMs, but without the overhead of an OS for each VM, and without any hardware emulation [19:04] so you can fit in a lot of containers on old hardware with little overhead [19:04] they are similar to openvz and vserver - they're not competition, though. [19:05] rather, they're the ongoing work to upstream the functionality from vserver and openvz [19:05] Containers are a userspace fiction built on top of some nifty kernel functionality. [19:05] There are two popular implementations right now: [19:05] the libvirt lxc driver, and liblxc (or jsut 'lxc') from lxc.sf.net [19:06] Here, I'm talking about lxc.sf.net [19:06] All right, in order to demo some lxc functionality, I set up a stock natty VM on amazon. You can get to it as: [19:06] ssh ec2-50-17-73-23.compute-1.amazonaws.com -l guest [19:06] password is 'none' [19:06] that should get you into read-only screen session. To get out, hit '~.' to kill ssh. [19:07] One of the kernel pieces used by containers is the namespaces. [19:07] You can use just the namespaces (for fun) using 'lxc-unshare' [19:07] it's not a very user-friendly command, though. [19:07] because it's rarely used... [19:08] what I just did there on the demo is to unshare my mount, pid, and utsname (hostname) namespaces [19:08] using 'lxc-unshare -s 'MOUNT|PID|UTSNAME" /bin/bash' [19:08] lxc-unshare doesn't remount /proc for you, so I had to do that. Once I've done that, ps only shows tasks in my pid namespace [19:08] also, I can change my hostname without changing the hostname on the rest f the system [19:09] When I exited the namespace, I was brought back to a shell with the old hostname [19:10] all right, another thing used by containers is bind mounts. Not much to say about them, let me just do a quick demo of playing with them: [19:10] ToyKeeper asked: Will there be a log available for this screen session? [19:11] yes, [19:11] oh, no. sorry [19:11] didn't think to set that up [19:11] hm, [19:11] ok, i'm logging it as of now. I'll decide where to put it later. thanks. [19:12] nothing fancy, just bind-mounting filesystems [19:13] which is a way of saving a lot of space, if you share /usr and /lib amongst a lot of containers [19:13] anyway, moving on to actual usage [19:14] Typically there are 3 ways that I might set up networking for a container [19:14] Often, if I'm lazy or already have it set up, I'll re-use the libvirt bridge, virbr0, to bind container NICs to [19:16] well, at least apt-get worked :) [19:16] If I'm on a laptop using wireless, I"ll usually do that route, because you can't directly bridge a wireless NIC. [19:16] And otherwise I'd have to set up my own iptables rules to do the forwarding from containers bridge to the host NIC [19:17] If I'm on a 'real' host, I'll bridge the host's NIC and use that for containers. [19:17] that's what lxc-veth.conf does [19:17] So first you have to set up /etc/network/interfaces to have br0 be a bridge, [19:18] have eth0 not have an address, and make eeth0 a bridge-port on br0 [19:18] as seen on the demo [19:18] Since that's set up, I can create a bridged container just using: [19:18] 'lxc-create -f /etc/lxc-veth.conf -n nattya -t natty' [19:18] nattya is the naem of the container, [19:18] natty is the template I'm using [19:18] and /etc/lxc-veth.conf is the config file to specify how to network [19:19] ruh roh [19:20] so lxc-create is how you create a new container [19:20] The rootfs and config files for each container are in /var/lib/lxc [19:20] you see there are three containers there - natty1, which I created before this session, and nattya and nattyz which I jsut created [19:21] The config file under /var/lib/lxc/natty1 shows some extra information, [19:21] including howmany tty's to set up, [19:21] and which devices to allow access to [19:21] the first devices line, 'lxc.cgroup.devices.deny = a' means 'by default, don't allow any access to any device.' [19:21] from there any other entries are whitelist entries [19:22] kim0 asked: Can I run a completely different system like centos under lxc on ubuntu ? [19:22] yes, you can, and many people do. [19:22] The main problem, usually, is in actually first setting up a container with that distro which works [19:23] You can't 100% use a stock iso install and have it boot as a container [19:23] It used to be there was a lot of work you had to do to make that work, [19:23] but now we're down to very few things. In fact, for ubuntu natty, we have a package called 'lxcguest' [19:23] if you take a stock ubuntu natty image, [19:23] and install 'lxcguest', then it will allow that image to boot as a container [19:24] It actually only does two things now: [19:24] 1. it detects that it is in a container (based on a boot argument provided by lxc-start), [19:24] uh, that wasn't suppsoed to be 1 :), [19:24] and based on that, if it is in a container, it [19:24] 1. starts a console on /dev/console, so that 'lxc-start' itself gets a console (like you see when i start a container) [19:25] 2. it changes /lib/init/fstab to one with fewer filesystems, [19:25] bc there are some which you cannot or should not mount in a container. [19:25] now, lxc ships with some 'templates'. [19:25] these are under /usr/lib/lxc/tempaltes [19:25] /usr/lib/lxc/templates that is [19:26] some of those templates, however, don't quite work right. So a next work item we want to tackle is to make those all work better, and add more [19:26] let's take a look at the lxc-natty one: [19:27] it takes a MIRROR option, which I always use at home, which lets me point it at a apt-cacher-ng instance [19:28] it starts by doing a debootstrap of a stock natty image into /var/cache/lxc/natty/ [19:28] so then, every time you create another container with natty template, it will rsync that image into place [19:29] then it configures it, setting hostname, setting up interfaces, [19:29] shuts up udev, [19:29] since the template by default creates 4 tty's, we get rid of /etc/init/tty5 and 6 [19:30] since we're not installing lxcguest, we just empty out /lib/init/fstab, [19:30] actually, that may be a problem [19:30] upstart upgrades may overwrite that [19:30] so we should instaed have lxc-natty template always install the lxcguest package [19:30] (note to self) [19:30] and finally, it installs the lxc configuration, which is that config file we looked at before with device access etc [19:31] ok, i've been rampling, let me look for and address any/all questions [19:31] kapil asked: What's the status of using lxc via libvirt? [19:31] good question, zul has actually been working on that. [19:32] libvirt-lxc in natty is fixed so that when you log out from console, you don't kill the container any more : [19:32] seconly, you can use the same lxcguest package I mentioned before in libvirt-lxc, [19:32] so you can pretty easily debootstrap an image, chroot to it to install lxcguest, and then use it in libvirt [19:33] we still may end up writing a new libvirt lxc driver, as an alternative to the current one, which just calls out to liblxc, so that libvirt and liblxc can be used to maniuplate the same containers [19:33] but still haven't gotten to that [19:34] kim0 asked: can I live migrate a lxc container [19:34] nope [19:34] for that, we'll first need checkpoint/restart. [19:34] I have a ppa with some kernel and userspace pieces - basically packaging the current upstream efforts. But nothing upstream, nothing in natty, not very promising short-term [19:35] ToyKeeper asked: Why would you want regular ttys in a container? Can't the host get a guest shell similar to openvz's "vzctl enter $guestID" ? [19:35] nope, [19:35] if the container is set up right, then you can of course ssh into it; [19:35] or you can run lxc-start in a screen session so you can get back to it like that, [19:36] what the regular 'lxc.tty = 4' gives you is the ability to do 'lxc-console' to log in [19:36] as follows: [19:36] I start the container with '-d' to not give me a console on my current tty [19:36] then lxc-console -n natty1 connects me to the tty... [19:37] ctrl-a q exits it [19:37] now, the other way you might *want* to enter a container, which i think the vzctl enter does, [19:37] is to actually move your current task into the container [19:37] That currently is not possible [19:37] there is a kernel patch, being driven now by dlezcano, to make that possible, and a patch to lxc to use it using the 'lxc-attach' command. [19:38] but the kernel patch is not yet accepted upstream [19:38] so you cannot 'enter' a container === niemeyer_bbl is now known as niemeyer [19:38] rye asked: Are there any specific settings for cgroup mount for the host? [19:38] Currently I just mount all cgroups. [19:39] Using fstab in the demo machine, or just 'mount -t cgroup cgroup /ccgroup' [19:39] the ns cgroup is going away soon, [19:39] so when you don't have ns cgrou pmounted, then you'll need cgroup.clone_children to be 1 [19:40] however, you don't need that in natty. in n+1 you probably will. [19:40] kim0 asked: How safe is it to let random strangers ssh into containers as root ? how safe is it to run random software inside containers .. can they break out [19:40] not safe at all [19:40] If you google for 'lxc lsm' you can find some suggestions for using selinux or smack to clamp down [19:41] and, over the next year or two, I'm hoping to keep working on, and finally complete, the 'user namespaces' === Jackson is now known as Guest46715 [19:41] with user namespaces, you, as user 'kim0' and without privilege, woudl create a container. root in that container would have full privilege over things which you yourself own [19:41] So any files owned by kim0 on the host; anything private to your namespaces, like your own hostname; [19:41] BUT, [19:42] even when that is done, there is another consideration: nothing is sitting between your users and the kernel [19:42] so any syscalls which have vulnerabilities - and there are always some - can be exploited [19:42] now, [19:43] the fact is of course that similar concerns should keep you vigilent over other virtualization - kvm/vmware/etc - as well. The video driver, for instance, may allow the guest user to break out. [19:43] kim0 asked: Can one enforce cpu/memory/network limits (cgroups?) on containers [19:44] you can lock a container into one or several cpus, [19:44] you can limit it's memory, [19:44] you can, it appears (this is new to me) throttle block io (which has been in the works for years :) [19:45] the net_cls.classid has to do with some filtering based on packet labels. I've looked at it in the past, but never seen evidence of anyone using it [19:46] for documentation on cgroups, I would look at Documentation/cgroups in the kernel source [19:46] oh yes, and of course you can access devices [19:47] you remove device access by writing to /cgroup//devices.deny, an entry of the form [19:47] major:minor rwm [19:47] where r=read,w=write,m=mknod [19:47] oh, i lied, [19:48] first is 'a' for any, 'c' for char, or 'b' for block, [19:48] then major:minor, then rwm [19:48] you can see the current settings for cgroup in /cgroup/devices.list [19:48] and allow access by writing to devices.allow [19:48] sveiss asked: is there any resource control support integrated with containers? Limiting CPU, memory/swap, etc... I'm thinking along the lines of the features provided by Solaris, if you're familiar with those [19:49] you can pin a container to a cpu, and you can track its usage, but you cannot (last I knew) limit % cpu [19:49] oh, there is one more cgroup i've not mentioned, 'freezer', which as the name sugguests lets you freeze a task. [19:50] so i can start up the natty1 guest and then freeze it like so [19:50] lxc-freeze just does 'echo "FROZEN" > /cgroup/$container/freezer.state' for me [19:50] lxc-thaw thaws it [19:50] make that lxc-unfreeze :) [19:51] can't get a console when it's frozen :) [19:51] There are 10 minutes remaining in the current session. [19:51] there are a few other lxc-* commands to help administration [19:52] lxc-ls lists the available containers in the first line, [19:52] and the active ones inthe second [19:52] lxc-info just shows its state [19:52] lxc-ps shows tasks int he container, but you have to treat it just right [19:53] lxc-ps just does 'ps' and shows you if any tasks in your bash session are in a container :) [19:53] lxc-ps --name natty1 shows me the processes in container natty1 [19:53] and lxc-ps -ef shows me all tasks, prepended by the container any task is in [19:53] lxc-ps --name natty1 --forest is the prettiest :) [19:54] now, i didn't get a chance to try this in advance so iwll probably fail, but [19:54] hm [19:56] There are 5 minutes remaining in the current session. [19:56] there is the /lib/init/fstab which lxcgueset package will use [19:57] ok, what i did there, [19:57] was i had debootstrapped a stock image into 'demo1', i jsut installed lxcguest, [19:57] and fired it up as a container [19:57] only problem ims i don't know the password :) [19:57] kim0 asked: Any way to update the base natty template that gets rsync'ed to create new guests [19:58] sure, chroot to /var/cache/lxc/natty1 and apt-get update :) [19:58] ok, thanks everyone [19:59] Thanks a lot .. It's been a great deep dive session }}}