== Using Linux Containers in Natty -- hallyn ==
{{{#!IRC
[19:01] <ClassBot> Logs for this session will be available at http://irclogs.ubuntu.com/2011/03/23/%23ubuntu-classroom.html following the conclusion of the session.
[19:02] <hallyn> Ok, hey all
[19:03] <hallyn> I'm going to talk about containers on natty.
[19:03] <hallyn> In the past, that is, until definately lucid, there were some constraints which made containers more painful to administer -
[19:03] <hallyn> i.e.  you couldn't safely upgrade udev
[19:03] <hallyn> that's now gone!
[19:04] <hallyn> but, let me start at the start
[19:04] <hallyn> containers, for anyone really new, are a way to run what appear to be different VMs, but without the overhead of an OS for each VM, and without any hardware emulation
[19:04] <hallyn> so you can fit in a lot of containers on old hardware with little overhead
[19:04] <hallyn> they are similar to openvz and vserver - they're not competition, though.
[19:05] <hallyn> rather, they're the ongoing work to upstream the functionality from vserver and openvz
[19:05] <hallyn> Containers are a userspace fiction built on top of some nifty kernel functionality.
[19:05] <hallyn> There are two popular implementations right now:
[19:05] <hallyn> the libvirt lxc driver, and liblxc (or jsut 'lxc') from lxc.sf.net
[19:06] <hallyn> Here, I'm talking about lxc.sf.net
[19:06] <hallyn> All right, in order to demo some lxc functionality, I set up a stock natty VM on amazon.  You can get to it as:
[19:06] <hallyn> ssh ec2-50-17-73-23.compute-1.amazonaws.com -l guest
[19:06] <hallyn> password is 'none'
[19:06] <hallyn> that should get you into read-only screen session.  To get out, hit '~.' to kill ssh.
[19:07] <hallyn> One of the kernel pieces used by containers is the namespaces.
[19:07] <hallyn> You can use just the namespaces (for fun) using 'lxc-unshare'
[19:07] <hallyn> it's not a very user-friendly command, though.
[19:07] <hallyn> because it's rarely used...
[19:08] <hallyn> what I just did there on the demo is to unshare my mount, pid, and utsname (hostname) namespaces
[19:08] <hallyn> using 'lxc-unshare -s 'MOUNT|PID|UTSNAME" /bin/bash'
[19:08] <hallyn> lxc-unshare doesn't remount /proc for you, so I had to do that.  Once I've done that, ps only shows tasks in my pid namespace
[19:08] <hallyn> also, I can change my hostname without changing the hostname on the rest f the system
[19:09] <hallyn> When I exited the namespace, I was brought back to a shell with the old hostname
[19:10] <hallyn> all right, another thing used by containers is bind mounts.  Not much to say about them, let me just do a quick demo of playing with them:
[19:10] <ClassBot> ToyKeeper asked: Will there be a log available for this screen session?
[19:11] <hallyn> yes,
[19:11] <hallyn> oh, no. sorry
[19:11] <hallyn> didn't think to set that up
[19:11] <hallyn> hm,
[19:11] <hallyn> ok, i'm logging it as of now.  I'll decide where to put it later.  thanks.
[19:12] <hallyn> nothing fancy, just bind-mounting filesystems
[19:13] <hallyn> which is a way of saving a lot of space, if you share /usr and /lib amongst a lot of containers
[19:13] <hallyn> anyway, moving on to actual usage
[19:14] <hallyn> Typically there are 3 ways that I might set up networking for a container
[19:14] <hallyn> Often, if I'm lazy or already have it set up, I'll re-use the libvirt bridge, virbr0, to bind container NICs to
[19:16] <hallyn> well, at least apt-get worked :)
[19:16] <hallyn> If I'm on a laptop using wireless, I"ll usually do that route, because you can't directly bridge a wireless NIC.
[19:16] <hallyn> And otherwise I'd have to set up my own iptables rules to do the forwarding from containers bridge to the host NIC
[19:17] <hallyn> If I'm on a 'real' host, I'll bridge the host's NIC and use that for containers.
[19:17] <hallyn> that's what lxc-veth.conf does
[19:17] <hallyn> So first you have to set up /etc/network/interfaces to have br0 be a bridge,
[19:18] <hallyn> have eth0 not have an address, and make eeth0 a bridge-port on br0
[19:18] <hallyn> as seen on the demo
[19:18] <hallyn> Since that's set up, I can create a bridged container just using:
[19:18] <hallyn> 'lxc-create -f /etc/lxc-veth.conf -n nattya -t natty'
[19:18] <hallyn> nattya is the naem of the container,
[19:18] <hallyn> natty is the template I'm using
[19:18] <hallyn> and /etc/lxc-veth.conf is the config file to specify how to network
[19:19] <hallyn> ruh roh
[19:20] <hallyn> so lxc-create is how you create a new container
[19:20] <hallyn> The rootfs and config files for each container are in /var/lib/lxc
[19:20] <hallyn> you see there are three containers there - natty1, which I created before this session, and nattya and nattyz which I jsut created
[19:21] <hallyn> The config file under /var/lib/lxc/natty1 shows some extra information,
[19:21] <hallyn> including howmany tty's to set up,
[19:21] <hallyn> and which devices to allow access to
[19:21] <hallyn> the first devices line, 'lxc.cgroup.devices.deny = a' means 'by default, don't allow any access to any device.'
[19:21] <hallyn> from there any other entries are whitelist entries
[19:22] <ClassBot> kim0 asked: Can I run a completely different system like centos under lxc on ubuntu ?
[19:22] <hallyn> yes, you can, and many people do.
[19:22] <hallyn> The main problem, usually, is in actually first setting up a container with that distro which works
[19:23] <hallyn> You can't 100% use a stock iso install and have it boot as a container
[19:23] <hallyn> It used to be there was a lot of work you had to do to make that work,
[19:23] <hallyn> but now we're down to very few things.  In fact, for ubuntu natty, we have a package called 'lxcguest'
[19:23] <hallyn> if you take a stock ubuntu natty image,
[19:23] <hallyn> and install 'lxcguest', then it will allow that image to boot as a container
[19:24] <hallyn> It actually only does two things now:
[19:24] <hallyn> 1. it detects that it is in a container (based on a boot argument provided by lxc-start),
[19:24] <hallyn> uh, that wasn't suppsoed to be 1 :),
[19:24] <hallyn> and based on that, if it is in a container, it
[19:24] <hallyn> 1. starts a console on /dev/console, so that 'lxc-start' itself gets a console (like you see when i start a container)
[19:25] <hallyn> 2. it changes /lib/init/fstab to one with fewer filesystems,
[19:25] <hallyn> bc there are some which you cannot or should not mount in a container.
[19:25] <hallyn> now, lxc ships with some 'templates'.
[19:25] <hallyn> these are under /usr/lib/lxc/tempaltes
[19:25] <hallyn> /usr/lib/lxc/templates that is
[19:26] <hallyn> some of those templates, however, don't quite work right.  So a next work item we want to tackle is to make those all work better, and add more
[19:26] <hallyn> let's take a look at the lxc-natty one:
[19:27] <hallyn> it takes a MIRROR option, which I always use at home, which lets me point it at a apt-cacher-ng instance
[19:28] <hallyn> it starts by doing a debootstrap of a stock natty image into /var/cache/lxc/natty/
[19:28] <hallyn> so then, every time you create another container with natty template, it will rsync that image into place
[19:29] <hallyn> then it configures it, setting hostname, setting up interfaces,
[19:29] <hallyn> shuts up udev,
[19:29] <hallyn> since the template by default creates 4 tty's, we get rid of /etc/init/tty5 and 6
[19:30] <hallyn> since we're not installing lxcguest, we just empty out /lib/init/fstab,
[19:30] <hallyn> actually, that may be a problem
[19:30] <hallyn> upstart upgrades may overwrite that
[19:30] <hallyn> so we should instaed have lxc-natty template always install the lxcguest package
[19:30] <hallyn> (note to self)
[19:30] <hallyn> and finally, it installs the lxc configuration, which is that config file we looked at before with device access etc
[19:31] <hallyn> ok, i've been rampling, let me look for and address any/all questions
[19:31] <ClassBot> kapil asked: What's the status of using lxc via libvirt?
[19:31] <hallyn> good question, zul has actually been working on that.
[19:32] <hallyn> libvirt-lxc in natty is fixed so that when you log out from console, you don't kill the container any more :
[19:32] <hallyn> seconly, you can use the same lxcguest package I mentioned before in libvirt-lxc,
[19:32] <hallyn> so you can pretty easily debootstrap an image, chroot to it to install lxcguest, and then use it in libvirt
[19:33] <hallyn> we still may end up writing a new libvirt lxc driver, as an alternative to the current one, which just calls out to liblxc, so that libvirt and liblxc can be used to maniuplate the same containers
[19:33] <hallyn> but still haven't gotten to that
[19:34] <ClassBot> kim0 asked: can I live migrate a lxc container
[19:34] <hallyn> nope
[19:34] <hallyn> for that, we'll first need checkpoint/restart.
[19:34] <hallyn> I have a ppa with some kernel and userspace pieces - basically packaging the current upstream efforts.  But nothing upstream, nothing in natty, not very promising short-term
[19:35] <ClassBot> ToyKeeper asked: Why would you want regular ttys in a container?  Can't the host get a guest shell similar to openvz's "vzctl enter $guestID" ?
[19:35] <hallyn> nope,
[19:35] <hallyn> if the container is set up right, then you can of course ssh into it;
[19:35] <hallyn> or you can run lxc-start in a screen session so you can get back to it like that,
[19:36] <hallyn> what the regular 'lxc.tty = 4' gives you is the ability to do 'lxc-console' to log in
[19:36] <hallyn> as follows:
[19:36] <hallyn> I start the container with '-d' to not give me a console on my current tty
[19:36] <hallyn> then lxc-console -n natty1 connects me to the tty...
[19:37] <hallyn> ctrl-a q exits it
[19:37] <hallyn> now, the other way you might *want* to enter a container, which i think the vzctl enter does,
[19:37] <hallyn> is to actually move your current task into the container
[19:37] <hallyn> That currently is not possible
[19:37] <hallyn> there is a kernel patch, being driven now by dlezcano, to make that possible, and a patch to lxc to use it using the 'lxc-attach' command.
[19:38] <hallyn> but the kernel patch is not yet accepted upstream
[19:38] <hallyn> so you cannot 'enter' a container
=== niemeyer_bbl is now known as niemeyer
[19:38] <ClassBot> rye asked: Are there any specific settings for cgroup mount for the host?
[19:38] <hallyn> Currently I just mount all cgroups.
[19:39] <hallyn> Using fstab in the demo machine, or just 'mount -t cgroup cgroup /ccgroup'
[19:39] <hallyn> the ns cgroup is going away soon,
[19:39] <hallyn> so when you don't have ns cgrou pmounted, then you'll need cgroup.clone_children to be 1
[19:40] <hallyn> however, you don't need that in natty.  in n+1 you probably will.
[19:40] <ClassBot> kim0 asked: How safe is it to let random strangers ssh into containers as root ? how safe is it to run random software inside containers .. can they break out
[19:40] <hallyn> not safe at all
[19:40] <hallyn> If you google for 'lxc lsm' you can find some suggestions for using selinux or smack to clamp down
[19:41] <hallyn> and, over the next year or two, I'm hoping to keep working on, and finally complete, the 'user namespaces'
=== Jackson is now known as Guest46715
[19:41] <hallyn> with user namespaces, you, as user 'kim0' and without privilege, woudl create a container.  root in that container would have full privilege over things which you yourself own
[19:41] <hallyn> So any files owned by kim0 on the host;  anything private to your namespaces, like your own hostname;
[19:41] <hallyn> BUT,
[19:42] <hallyn> even when that is done, there is another consideration:  nothing is sitting between your users and the kernel
[19:42] <hallyn> so any syscalls which have vulnerabilities - and there are always some - can be exploited
[19:42] <hallyn> now,
[19:43] <hallyn> the fact is of course that similar concerns should keep you vigilent over other virtualization - kvm/vmware/etc - as well.  The video driver, for instance, may allow the guest user to break out.
[19:43] <ClassBot> kim0 asked: Can one enforce cpu/memory/network limits (cgroups?) on containers
[19:44] <hallyn> you can lock a container into one or several cpus,
[19:44] <hallyn> you can limit it's memory,
[19:44] <hallyn> you can, it appears (this is new to me) throttle block io (which has been in the works for years :)
[19:45] <hallyn> the net_cls.classid has to do with some filtering based on packet labels.  I've looked at it in the past, but never seen evidence of anyone using it
[19:46] <hallyn> for documentation on cgroups, I would look at Documentation/cgroups in the kernel source
[19:46] <hallyn> oh yes, and of course you can access devices
[19:47] <hallyn> you remove device access by writing to /cgroup/<name>/devices.deny, an entry of the form
[19:47] <hallyn> major:minor rwm
[19:47] <hallyn> where r=read,w=write,m=mknod
[19:47] <hallyn> oh, i lied,
[19:48] <hallyn> first is 'a' for any, 'c' for char, or 'b' for block,
[19:48] <hallyn> then major:minor, then rwm
[19:48] <hallyn> you can see the current settings for cgroup in /cgroup/devices.list
[19:48] <hallyn> and allow access by writing to devices.allow
[19:48] <ClassBot> sveiss asked: is there any resource control support integrated with containers? Limiting CPU, memory/swap, etc... I'm thinking along the lines of the features provided by Solaris, if you're familiar with those
[19:49] <hallyn> you can pin a container to a cpu, and you can track its usage, but you cannot (last I knew) limit % cpu
[19:49] <hallyn> oh, there is one more cgroup i've not mentioned, 'freezer', which as the name sugguests lets you freeze a task.
[19:50] <hallyn> so i can start up the natty1 guest and then freeze it like so
[19:50] <hallyn> lxc-freeze just does 'echo "FROZEN" > /cgroup/$container/freezer.state' for me
[19:50] <hallyn> lxc-thaw thaws it
[19:50] <hallyn> make that lxc-unfreeze :)
[19:51] <hallyn> can't get a console when it's frozen :)
[19:51] <ClassBot> There are 10 minutes remaining in the current session.
[19:51] <hallyn> there are a few other lxc-* commands to help administration
[19:52] <hallyn> lxc-ls lists the available containers in the first line,
[19:52] <hallyn> and the active ones inthe second
[19:52] <hallyn> lxc-info just shows its state
[19:52] <hallyn> lxc-ps shows tasks int he container, but you have to treat it just right
[19:53] <hallyn> lxc-ps just does 'ps' and shows you if any tasks in your bash session are in a container :)
[19:53] <hallyn> lxc-ps --name natty1 shows me the processes in container natty1
[19:53] <hallyn> and lxc-ps -ef shows me all tasks, prepended by the container any task is in
[19:53] <hallyn> lxc-ps --name natty1 --forest is the prettiest :)
[19:54] <hallyn> now, i didn't get a chance to try this in advance so iwll probably fail, but
[19:54] <hallyn> hm
[19:56] <ClassBot> There are 5 minutes remaining in the current session.
[19:56] <hallyn> there is the /lib/init/fstab which lxcgueset package will use
[19:57] <hallyn> ok, what i did there,
[19:57] <hallyn> was i had debootstrapped a stock image into 'demo1',  i jsut installed lxcguest,
[19:57] <hallyn> and fired it up as a container
[19:57] <hallyn> only problem ims i don't know the password :)
[19:57] <ClassBot> kim0 asked: Any way to update the base natty template that gets rsync'ed to create new guests
[19:58] <hallyn> sure, chroot to /var/cache/lxc/natty1 and apt-get update :)
[19:58] <hallyn> ok, thanks everyone
[19:59] <kim0> Thanks a lot .. It's been a great deep dive session
}}}