Using Linux Containers in Natty -- hallyn

   1 [19:01] <ClassBot> Logs for this session will be available at following the conclusion of the session.
   2 [19:02] <hallyn> Ok, hey all
   3 [19:03] <hallyn> I'm going to talk about containers on natty.
   4 [19:03] <hallyn> In the past, that is, until definately lucid, there were some constraints which made containers more painful to administer -
   5 [19:03] <hallyn> i.e.  you couldn't safely upgrade udev
   6 [19:03] <hallyn> that's now gone!
   7 [19:04] <hallyn> but, let me start at the start
   8 [19:04] <hallyn> containers, for anyone really new, are a way to run what appear to be different VMs, but without the overhead of an OS for each VM, and without any hardware emulation
   9 [19:04] <hallyn> so you can fit in a lot of containers on old hardware with little overhead
  10 [19:04] <hallyn> they are similar to openvz and vserver - they're not competition, though.
  11 [19:05] <hallyn> rather, they're the ongoing work to upstream the functionality from vserver and openvz
  12 [19:05] <hallyn> Containers are a userspace fiction built on top of some nifty kernel functionality.
  13 [19:05] <hallyn> There are two popular implementations right now:
  14 [19:05] <hallyn> the libvirt lxc driver, and liblxc (or jsut 'lxc') from
  15 [19:06] <hallyn> Here, I'm talking about
  16 [19:06] <hallyn> All right, in order to demo some lxc functionality, I set up a stock natty VM on amazon.  You can get to it as:
  17 [19:06] <hallyn> ssh -l guest
  18 [19:06] <hallyn> password is 'none'
  19 [19:06] <hallyn> that should get you into read-only screen session.  To get out, hit '~.' to kill ssh.
  20 [19:07] <hallyn> One of the kernel pieces used by containers is the namespaces.
  21 [19:07] <hallyn> You can use just the namespaces (for fun) using 'lxc-unshare'
  22 [19:07] <hallyn> it's not a very user-friendly command, though.
  23 [19:07] <hallyn> because it's rarely used...
  24 [19:08] <hallyn> what I just did there on the demo is to unshare my mount, pid, and utsname (hostname) namespaces
  25 [19:08] <hallyn> using 'lxc-unshare -s 'MOUNT|PID|UTSNAME" /bin/bash'
  26 [19:08] <hallyn> lxc-unshare doesn't remount /proc for you, so I had to do that.  Once I've done that, ps only shows tasks in my pid namespace
  27 [19:08] <hallyn> also, I can change my hostname without changing the hostname on the rest f the system
  28 [19:09] <hallyn> When I exited the namespace, I was brought back to a shell with the old hostname
  29 [19:10] <hallyn> all right, another thing used by containers is bind mounts.  Not much to say about them, let me just do a quick demo of playing with them:
  30 [19:10] <ClassBot> ToyKeeper asked: Will there be a log available for this screen session?
  31 [19:11] <hallyn> yes,
  32 [19:11] <hallyn> oh, no. sorry
  33 [19:11] <hallyn> didn't think to set that up
  34 [19:11] <hallyn> hm,
  35 [19:11] <hallyn> ok, i'm logging it as of now.  I'll decide where to put it later.  thanks.
  36 [19:12] <hallyn> nothing fancy, just bind-mounting filesystems
  37 [19:13] <hallyn> which is a way of saving a lot of space, if you share /usr and /lib amongst a lot of containers
  38 [19:13] <hallyn> anyway, moving on to actual usage
  39 [19:14] <hallyn> Typically there are 3 ways that I might set up networking for a container
  40 [19:14] <hallyn> Often, if I'm lazy or already have it set up, I'll re-use the libvirt bridge, virbr0, to bind container NICs to
  41 [19:16] <hallyn> well, at least apt-get worked :)
  42 [19:16] <hallyn> If I'm on a laptop using wireless, I"ll usually do that route, because you can't directly bridge a wireless NIC.
  43 [19:16] <hallyn> And otherwise I'd have to set up my own iptables rules to do the forwarding from containers bridge to the host NIC
  44 [19:17] <hallyn> If I'm on a 'real' host, I'll bridge the host's NIC and use that for containers.
  45 [19:17] <hallyn> that's what lxc-veth.conf does
  46 [19:17] <hallyn> So first you have to set up /etc/network/interfaces to have br0 be a bridge,
  47 [19:18] <hallyn> have eth0 not have an address, and make eeth0 a bridge-port on br0
  48 [19:18] <hallyn> as seen on the demo
  49 [19:18] <hallyn> Since that's set up, I can create a bridged container just using:
  50 [19:18] <hallyn> 'lxc-create -f /etc/lxc-veth.conf -n nattya -t natty'
  51 [19:18] <hallyn> nattya is the naem of the container,
  52 [19:18] <hallyn> natty is the template I'm using
  53 [19:18] <hallyn> and /etc/lxc-veth.conf is the config file to specify how to network
  54 [19:19] <hallyn> ruh roh
  55 [19:20] <hallyn> so lxc-create is how you create a new container
  56 [19:20] <hallyn> The rootfs and config files for each container are in /var/lib/lxc
  57 [19:20] <hallyn> you see there are three containers there - natty1, which I created before this session, and nattya and nattyz which I jsut created
  58 [19:21] <hallyn> The config file under /var/lib/lxc/natty1 shows some extra information,
  59 [19:21] <hallyn> including howmany tty's to set up,
  60 [19:21] <hallyn> and which devices to allow access to
  61 [19:21] <hallyn> the first devices line, 'lxc.cgroup.devices.deny = a' means 'by default, don't allow any access to any device.'
  62 [19:21] <hallyn> from there any other entries are whitelist entries
  63 [19:22] <ClassBot> kim0 asked: Can I run a completely different system like centos under lxc on ubuntu ?
  64 [19:22] <hallyn> yes, you can, and many people do.
  65 [19:22] <hallyn> The main problem, usually, is in actually first setting up a container with that distro which works
  66 [19:23] <hallyn> You can't 100% use a stock iso install and have it boot as a container
  67 [19:23] <hallyn> It used to be there was a lot of work you had to do to make that work,
  68 [19:23] <hallyn> but now we're down to very few things.  In fact, for ubuntu natty, we have a package called 'lxcguest'
  69 [19:23] <hallyn> if you take a stock ubuntu natty image,
  70 [19:23] <hallyn> and install 'lxcguest', then it will allow that image to boot as a container
  71 [19:24] <hallyn> It actually only does two things now:
  72 [19:24] <hallyn> 1. it detects that it is in a container (based on a boot argument provided by lxc-start),
  73 [19:24] <hallyn> uh, that wasn't suppsoed to be 1 :),
  74 [19:24] <hallyn> and based on that, if it is in a container, it
  75 [19:24] <hallyn> 1. starts a console on /dev/console, so that 'lxc-start' itself gets a console (like you see when i start a container)
  76 [19:25] <hallyn> 2. it changes /lib/init/fstab to one with fewer filesystems,
  77 [19:25] <hallyn> bc there are some which you cannot or should not mount in a container.
  78 [19:25] <hallyn> now, lxc ships with some 'templates'.
  79 [19:25] <hallyn> these are under /usr/lib/lxc/tempaltes
  80 [19:25] <hallyn> /usr/lib/lxc/templates that is
  81 [19:26] <hallyn> some of those templates, however, don't quite work right.  So a next work item we want to tackle is to make those all work better, and add more
  82 [19:26] <hallyn> let's take a look at the lxc-natty one:
  83 [19:27] <hallyn> it takes a MIRROR option, which I always use at home, which lets me point it at a apt-cacher-ng instance
  84 [19:28] <hallyn> it starts by doing a debootstrap of a stock natty image into /var/cache/lxc/natty/
  85 [19:28] <hallyn> so then, every time you create another container with natty template, it will rsync that image into place
  86 [19:29] <hallyn> then it configures it, setting hostname, setting up interfaces,
  87 [19:29] <hallyn> shuts up udev,
  88 [19:29] <hallyn> since the template by default creates 4 tty's, we get rid of /etc/init/tty5 and 6
  89 [19:30] <hallyn> since we're not installing lxcguest, we just empty out /lib/init/fstab,
  90 [19:30] <hallyn> actually, that may be a problem
  91 [19:30] <hallyn> upstart upgrades may overwrite that
  92 [19:30] <hallyn> so we should instaed have lxc-natty template always install the lxcguest package
  93 [19:30] <hallyn> (note to self)
  94 [19:30] <hallyn> and finally, it installs the lxc configuration, which is that config file we looked at before with device access etc
  95 [19:31] <hallyn> ok, i've been rampling, let me look for and address any/all questions
  96 [19:31] <ClassBot> kapil asked: What's the status of using lxc via libvirt?
  97 [19:31] <hallyn> good question, zul has actually been working on that.
  98 [19:32] <hallyn> libvirt-lxc in natty is fixed so that when you log out from console, you don't kill the container any more :
  99 [19:32] <hallyn> seconly, you can use the same lxcguest package I mentioned before in libvirt-lxc,
 100 [19:32] <hallyn> so you can pretty easily debootstrap an image, chroot to it to install lxcguest, and then use it in libvirt
 101 [19:33] <hallyn> we still may end up writing a new libvirt lxc driver, as an alternative to the current one, which just calls out to liblxc, so that libvirt and liblxc can be used to maniuplate the same containers
 102 [19:33] <hallyn> but still haven't gotten to that
 103 [19:34] <ClassBot> kim0 asked: can I live migrate a lxc container
 104 [19:34] <hallyn> nope
 105 [19:34] <hallyn> for that, we'll first need checkpoint/restart.
 106 [19:34] <hallyn> I have a ppa with some kernel and userspace pieces - basically packaging the current upstream efforts.  But nothing upstream, nothing in natty, not very promising short-term
 107 [19:35] <ClassBot> ToyKeeper asked: Why would you want regular ttys in a container?  Can't the host get a guest shell similar to openvz's "vzctl enter $guestID" ?
 108 [19:35] <hallyn> nope,
 109 [19:35] <hallyn> if the container is set up right, then you can of course ssh into it;
 110 [19:35] <hallyn> or you can run lxc-start in a screen session so you can get back to it like that,
 111 [19:36] <hallyn> what the regular 'lxc.tty = 4' gives you is the ability to do 'lxc-console' to log in
 112 [19:36] <hallyn> as follows:
 113 [19:36] <hallyn> I start the container with '-d' to not give me a console on my current tty
 114 [19:36] <hallyn> then lxc-console -n natty1 connects me to the tty...
 115 [19:37] <hallyn> ctrl-a q exits it
 116 [19:37] <hallyn> now, the other way you might *want* to enter a container, which i think the vzctl enter does,
 117 [19:37] <hallyn> is to actually move your current task into the container
 118 [19:37] <hallyn> That currently is not possible
 119 [19:37] <hallyn> there is a kernel patch, being driven now by dlezcano, to make that possible, and a patch to lxc to use it using the 'lxc-attach' command.
 120 [19:38] <hallyn> but the kernel patch is not yet accepted upstream
 121 [19:38] <hallyn> so you cannot 'enter' a container
 122 === niemeyer_bbl is now known as niemeyer
 123 [19:38] <ClassBot> rye asked: Are there any specific settings for cgroup mount for the host?
 124 [19:38] <hallyn> Currently I just mount all cgroups.
 125 [19:39] <hallyn> Using fstab in the demo machine, or just 'mount -t cgroup cgroup /ccgroup'
 126 [19:39] <hallyn> the ns cgroup is going away soon,
 127 [19:39] <hallyn> so when you don't have ns cgrou pmounted, then you'll need cgroup.clone_children to be 1
 128 [19:40] <hallyn> however, you don't need that in natty.  in n+1 you probably will.
 129 [19:40] <ClassBot> kim0 asked: How safe is it to let random strangers ssh into containers as root ? how safe is it to run random software inside containers .. can they break out
 130 [19:40] <hallyn> not safe at all
 131 [19:40] <hallyn> If you google for 'lxc lsm' you can find some suggestions for using selinux or smack to clamp down
 132 [19:41] <hallyn> and, over the next year or two, I'm hoping to keep working on, and finally complete, the 'user namespaces'
 133 === Jackson is now known as Guest46715
 134 [19:41] <hallyn> with user namespaces, you, as user 'kim0' and without privilege, woudl create a container.  root in that container would have full privilege over things which you yourself own
 135 [19:41] <hallyn> So any files owned by kim0 on the host;  anything private to your namespaces, like your own hostname;
 136 [19:41] <hallyn> BUT,
 137 [19:42] <hallyn> even when that is done, there is another consideration:  nothing is sitting between your users and the kernel
 138 [19:42] <hallyn> so any syscalls which have vulnerabilities - and there are always some - can be exploited
 139 [19:42] <hallyn> now,
 140 [19:43] <hallyn> the fact is of course that similar concerns should keep you vigilent over other virtualization - kvm/vmware/etc - as well.  The video driver, for instance, may allow the guest user to break out.
 141 [19:43] <ClassBot> kim0 asked: Can one enforce cpu/memory/network limits (cgroups?) on containers
 142 [19:44] <hallyn> you can lock a container into one or several cpus,
 143 [19:44] <hallyn> you can limit it's memory,
 144 [19:44] <hallyn> you can, it appears (this is new to me) throttle block io (which has been in the works for years :)
 145 [19:45] <hallyn> the net_cls.classid has to do with some filtering based on packet labels.  I've looked at it in the past, but never seen evidence of anyone using it
 146 [19:46] <hallyn> for documentation on cgroups, I would look at Documentation/cgroups in the kernel source
 147 [19:46] <hallyn> oh yes, and of course you can access devices
 148 [19:47] <hallyn> you remove device access by writing to /cgroup/<name>/devices.deny, an entry of the form
 149 [19:47] <hallyn> major:minor rwm
 150 [19:47] <hallyn> where r=read,w=write,m=mknod
 151 [19:47] <hallyn> oh, i lied,
 152 [19:48] <hallyn> first is 'a' for any, 'c' for char, or 'b' for block,
 153 [19:48] <hallyn> then major:minor, then rwm
 154 [19:48] <hallyn> you can see the current settings for cgroup in /cgroup/devices.list
 155 [19:48] <hallyn> and allow access by writing to devices.allow
 156 [19:48] <ClassBot> sveiss asked: is there any resource control support integrated with containers? Limiting CPU, memory/swap, etc... I'm thinking along the lines of the features provided by Solaris, if you're familiar with those
 157 [19:49] <hallyn> you can pin a container to a cpu, and you can track its usage, but you cannot (last I knew) limit % cpu
 158 [19:49] <hallyn> oh, there is one more cgroup i've not mentioned, 'freezer', which as the name sugguests lets you freeze a task.
 159 [19:50] <hallyn> so i can start up the natty1 guest and then freeze it like so
 160 [19:50] <hallyn> lxc-freeze just does 'echo "FROZEN" > /cgroup/$container/freezer.state' for me
 161 [19:50] <hallyn> lxc-thaw thaws it
 162 [19:50] <hallyn> make that lxc-unfreeze :)
 163 [19:51] <hallyn> can't get a console when it's frozen :)
 164 [19:51] <ClassBot> There are 10 minutes remaining in the current session.
 165 [19:51] <hallyn> there are a few other lxc-* commands to help administration
 166 [19:52] <hallyn> lxc-ls lists the available containers in the first line,
 167 [19:52] <hallyn> and the active ones inthe second
 168 [19:52] <hallyn> lxc-info just shows its state
 169 [19:52] <hallyn> lxc-ps shows tasks int he container, but you have to treat it just right
 170 [19:53] <hallyn> lxc-ps just does 'ps' and shows you if any tasks in your bash session are in a container :)
 171 [19:53] <hallyn> lxc-ps --name natty1 shows me the processes in container natty1
 172 [19:53] <hallyn> and lxc-ps -ef shows me all tasks, prepended by the container any task is in
 173 [19:53] <hallyn> lxc-ps --name natty1 --forest is the prettiest :)
 174 [19:54] <hallyn> now, i didn't get a chance to try this in advance so iwll probably fail, but
 175 [19:54] <hallyn> hm
 176 [19:56] <ClassBot> There are 5 minutes remaining in the current session.
 177 [19:56] <hallyn> there is the /lib/init/fstab which lxcgueset package will use
 178 [19:57] <hallyn> ok, what i did there,
 179 [19:57] <hallyn> was i had debootstrapped a stock image into 'demo1',  i jsut installed lxcguest,
 180 [19:57] <hallyn> and fired it up as a container
 181 [19:57] <hallyn> only problem ims i don't know the password :)
 182 [19:57] <ClassBot> kim0 asked: Any way to update the base natty template that gets rsync'ed to create new guests
 183 [19:58] <hallyn> sure, chroot to /var/cache/lxc/natty1 and apt-get update :)
 184 [19:58] <hallyn> ok, thanks everyone
 185 [19:59] <kim0> Thanks a lot .. It's been a great deep dive session

UbuntuCloudDays/23032011/UsingLinuxContainersInNatty (last edited 2011-03-26 17:03:21 by nigelbabu)