Dev Week -- Boot performance -- Keybuk -- Thu, Jan 22

UTC -4 (EST)

(01:01:47 PM) ***Keybuk wanders in and fiddles with the projector
(01:02:15 PM) Keybuk: I guess that I just get started
(01:02:18 PM) Keybuk: so, Hello
(01:02:32 PM) Keybuk: I'm Scott James Remnant, and I'll be having a bit of a chat about Boot Performance
(01:03:14 PM) Keybuk: I've not done one of these before, but I'll do my best
(01:03:26 PM) CrownAmbassador: Hi guys. Do you think I can list these 2 bugreports together? and the last one happens to be mine.
(01:03:41 PM) CrownAmbassador: sorry!
(01:03:45 PM) CrownAmbassador: wrong room
(01:03:48 PM) Keybuk: I'll try and answer any questions as I go, if they're relevant to what i'm talking about
(01:03:57 PM) Keybuk: and there should be plenty of time for general questions at the end
(01:04:19 PM) Keybuk: ok, so
(01:04:35 PM) Keybuk: boot performance, it's all about making Ubuntu useful to its users quicker
(01:04:58 PM) Keybuk: one of the first things is deciding exactly when you start and stop the clock
(01:05:19 PM) Keybuk: from a user's point of view, the machine starts booting when they press the power button
(01:05:37 PM) Keybuk: and stops booting when things stop moving around on the screen, and the machine stops making a loud disk noise
(01:05:53 PM) Keybuk: which is usually when they feel it's safe to try starting firefox
(01:06:19 PM) Keybuk: unfortunately us geeks know a bit more about what's going on
(01:06:23 PM) Keybuk: and we try and play games
(01:06:35 PM) Keybuk: for a long time, distributions only measured how long it took them to get to the login screen
(01:06:46 PM) Keybuk: and sped that up by starting lots of things after the login screen was up
(01:06:56 PM) Keybuk: (slowing down the login process)
(01:07:10 PM) Keybuk: and now there's a phase of starting things after the basic desktop is visible
(01:07:20 PM) Keybuk: and you hear things like "can we not start that 30s after boot?"
(01:07:35 PM) Keybuk: which is just doing the same again - it makes it slower for the user to use their machine
(01:07:51 PM) Keybuk: Windows tries many of these tracks
(01:08:05 PM) Keybuk: and years of experience has taught users not to touch it while things look like they're still loading
(01:08:21 PM) Keybuk: but then Windows has bugs like it closing the start menu on you ;)
(01:08:28 PM) Keybuk: and Ubuntu doesn't have any bugs like that
(01:08:35 PM) Keybuk: <fx: loud coughing noises>
(01:08:57 PM) creek23: :D
(01:09:08 PM) Keybuk: so, really, there's three distinct phases of a boot
(01:09:18 PM) Keybuk: you press the power button, and the hardware starts up and initialises (BIOS, etc.)
(01:09:29 PM) Keybuk: that hands over to our bootloader, and we start up the core operating system
(01:09:32 PM) Keybuk: (kernel, etc.)
(01:09:45 PM) Keybuk: and then we start the X server, and all of your desktop components and applets
(01:10:02 PM) Keybuk: now, we can't do anything about the first one
(01:10:10 PM) Keybuk: that's firmly in the hands of the hardware and chip manufacturers
(01:10:21 PM) Keybuk: but the last two are definitely under our control
(01:10:29 PM) Keybuk: so that's what we time
(01:10:58 PM) Keybuk: but be sure when you hear someone talking about a 30s boot, that they tell you what they mean by "Boot"
(01:12:09 PM) Keybuk: my times start from the boot loader, and end when we have a full desktop up
(01:12:36 PM) Keybuk: I'm hopeful that in the future, as we get better relationships with hardware partners, we'll be able to work closely on the hardware initialisation with them and start the time at the power button where it should be!
(01:13:01 PM) Keybuk: so
(01:13:09 PM) Keybuk: you may have noticed that boot performance has gotten a bit slower over the past 20 years
(01:13:41 PM) Keybuk: my modern, quad-core, all singing and dancing machine takes over a minute to boot
(01:13:53 PM) Keybuk: whereas the ZX Spectrum I had when I was 5 pretty much booted immediately
(01:14:11 PM) Keybuk: well, not quite it had a splash screen ;)  -- the screen went white, black, and then white again <g>
(01:14:35 PM) Keybuk: of course, it's not really fair
(01:14:52 PM) Keybuk: those machines had fixed hardware, their Operating System code was in ROM, and they just executed it
(01:15:10 PM) Keybuk: nowadays, not only do machines vastly differ in hardware, even from the same manufacturer
(01:15:20 PM) Keybuk: but users are able to add and remove hardware on the fly
(01:15:30 PM) Keybuk: and will often do so during boot and wonder why things go wrong ;)
(01:15:48 PM) Keybuk: and you can actually upgrade your operating system without a soldering iron
(01:16:15 PM) Keybuk: (someone who had a BBC Micro is going to point out that you didn't need a soldering iron because the chip was designed to come out fairly easily <g>)
(01:16:36 PM) Keybuk: so yes, we don't boot as fast as home machines of the 80s
(01:16:42 PM) Keybuk: but that's because they didn't really boot at all
(01:16:58 PM) Keybuk: and you can't plug a USB printer into a C64
(01:17:26 PM) Keybuk: so, I'm going to let you into a little secret
(01:17:41 PM) Keybuk: you can make your machine instantly boot faster
(01:17:47 PM) Keybuk: (or slower, if you're so inclined)
(01:17:57 PM) Keybuk: all it takes is changing one thing
(01:17:58 PM) Keybuk: the disk
(01:18:10 PM) Keybuk: your hard drive is by far the slowest part of your computer
(01:19:04 PM) Keybuk: if one person shows you a boot char on a quad
(01:19:07 PM) Keybuk: oops
(01:19:10 PM) Keybuk: if one person shows you a boot chart on a quad-core machine
(01:19:15 PM) Keybuk: and it's faster than your little laptop
(01:19:21 PM) Keybuk: it's not because of the cores, or even the blue leds on the side
(01:19:26 PM) Keybuk: it's because it has a much faster disk
(01:19:49 PM) Keybuk: and, unfortunately
(01:19:53 PM) Keybuk: the disk is where the operating system code lives
(01:19:57 PM) Keybuk: it's where all your configuration lives
(01:20:11 PM) Keybuk: boot is all about getting things from the disk and into memory
(01:20:16 PM) Keybuk: (and executing them on the processor)
(01:20:37 PM) Keybuk: so, to speed up the boot we need to either:
(01:20:40 PM) Keybuk:  1. load less from the disk
(01:20:50 PM) Keybuk:  2. be more efficient about our use of the disk
(01:21:13 PM) Keybuk: Loading less is the easy one
(01:21:38 PM) Keybuk: We take a good look at everything we do in the boot sequence, and we start being ruthless about it
(01:21:49 PM) Keybuk: How much of this stuff do we _really_ need to do on boot?
(01:21:57 PM) Keybuk: A good example here is a change we made quite early on in Ubuntu compared to Debian
(01:22:10 PM) Keybuk: we used to generate the database of available kernel modules on every single boot
(01:22:26 PM) Keybuk: this involves reading a lot of data, doing some CPU work, and writing it out to disk before using it
(01:22:38 PM) Keybuk: we don't do that anymore
(01:22:51 PM) Keybuk: now we regenerate that database only when you install a new kernel, or a new module package
(01:23:00 PM) Keybuk: and we do it in the package's maintainer scripts so it happens while you're running apt
(01:23:42 PM) Keybuk: it turns out that a lot of things done on boot could really be done during upgrades or software installations
(01:24:06 PM) rosset-brb is now known as rosset
(01:24:21 PM) Keybuk: another similar change is looking at whether services could be started on demand
(01:24:44 PM) Keybuk: do we need to start the entire printing subsystem until the user actually tries to print something?
(01:24:56 PM) Keybuk: do we need to start the bluetooth stack that early?
(01:25:10 PM) Keybuk: these are just examples of questions which haven't been answered yet, but that's the kind of thing we look at here
(01:25:16 PM) Keybuk: can we cut down on the amount of data we load
(01:25:29 PM) Keybuk: And secondly can we be more efficient about booting
(01:25:43 PM) Keybuk: this is kinda the same thing, but from a different point of view
(01:25:46 PM) max_ is now known as modestMAX
(01:25:51 PM) danne_ is now known as danne
(01:25:59 PM) Keybuk: oliver_g_ asked a question about compression
(01:26:01 PM) Keybuk: and it's a good example
(01:26:20 PM) Keybuk: in many cases, if there's a large amount of data to be read off the disk, it's better to have that data compressed
(01:26:24 PM) Keybuk: and decompress it in memory
(01:26:36 PM) Keybuk: the cost of decompression is frequently less than the reduction in disk time
(01:26:49 PM) Keybuk: and he's absolutely right that the prime target here is translations ;)
(01:27:05 PM) Keybuk: if you read 1,000 files on boot, that's slower than reading one single file with the same data
(01:27:31 PM) Keybuk: it might be easier to maintain your software with 1,000 xml files describing its configuration
(01:27:56 PM) Keybuk: but if you could pre-process that at build time to produce a single, compressed or binary file, with the data in it - you'd be amazed at the different
(01:28:22 PM) Keybuk: <Mirv> QUESTION: What chance of having an ugly, but very effective cache which includes collected files in a sequential order (200MB or so) that takes 2s to read, and includes every file needed for the whole boot? Prefetch had something experimental?
(01:28:28 PM) Keybuk: Mirv: ls /etc/readahead/boot
(01:28:37 PM) Keybuk: we've had such a thing for a number of releases now ;)
(01:30:10 PM) Keybuk: So, that's the "how" part
(01:30:14 PM) Keybuk: Do less and be more efficient
(01:30:22 PM) Keybuk: What approach do we take?
(01:30:31 PM) Keybuk: There's two schools of thought on this too
(01:30:43 PM) Keybuk: The first one is that you start off with your last release, and you sit down and examine it
(01:30:56 PM) Keybuk: you see what you can cut out, you see what you can improve, and make lots of incremental fixes
(01:31:12 PM) Keybuk: hopefully at the end of it, your boot will be a little bit faster
(01:31:27 PM) Keybuk: You can tell who's doing this, they say things like "the new version boots 10s faster than the old"
(01:31:47 PM) Keybuk: The second school of thought is that you start from scratch, and set yourself a target boot time from the very start
(01:31:57 PM) Keybuk: you say "we're going to boot in 30s"
(01:32:11 PM) Keybuk: you then split that up, and work out how much time you're going to give to each piece
(01:32:17 PM) Keybuk: "15s for the desktop, 15s for the core"
(01:32:20 PM) Keybuk: and split it up again
(01:32:33 PM) Keybuk: "5s for the kernel, 5s for basic stuff, 5s for services"
(01:32:37 PM) Keybuk: and at the end of it you have a budget
(01:32:49 PM) Keybuk: and you start at the beginning, and you work on one piece until you get it in under budget
(01:32:53 PM) Keybuk: and then you move on to the next piece
(01:33:07 PM) Keybuk: The second school gives much better results
(01:33:15 PM) Keybuk: but the first school means you can still release your distro if you don't make it
(01:33:39 PM) Keybuk: We're still following the first method, because we have a LOT of low hanging fruit
(01:33:59 PM) Keybuk: We already know a lot of the bugs and problems with our boot sequence, and we have more than enough work just fixing those for the next release or two
(01:34:41 PM) Keybuk: we're still at the point where our boot is over a minute
(01:34:50 PM) Keybuk: and we're looking at detail of multiple seconds or more
(01:35:04 PM) Keybuk: so we've got plenty of work to do ;)
(01:35:19 PM) Keybuk: At some point, we'll have reached the fastest we can go with this method
(01:35:24 PM) Keybuk: all the fixes and bugs we know about will be gone
(01:35:29 PM) Keybuk: and it'll be as fast as we can get it
(01:35:33 PM) Keybuk: (I reckon this is around the 30s mark)
(01:35:44 PM) Keybuk: at that point, we'd be looking at switching to the second method
(01:36:02 PM) Keybuk: we might spend an entire release or two just making the kernel come up in 1s instead of 2.5s to get it under budget
(01:36:43 PM) Keybuk: So how do you know how fast your computer is booting?
(01:36:49 PM) Keybuk: And how do you work out where you can speed it up
(01:37:04 PM) Keybuk: There's a piece of software in the archive called "bootchart"
(01:37:06 PM) Keybuk: it's really easy
(01:37:09 PM) Keybuk: apt-get install bootchart
(01:37:17 PM) Keybuk: and every time you reboot, you'll get a PNG file in /var/log/bootchart
(01:37:41 PM) Keybuk: it won't include your login and suchforth, but you can abuse it to chart everything
(01:38:12 PM) Keybuk: if you remove the /etc/rc2.d/S99stop-bootchart symlink, and remember to run "/etc/init.d/stop-bootchart start" after you login
(01:38:15 PM) Keybuk: you can have the whole thing
(01:38:25 PM) Keybuk: (and then use gimp to cut the right hand side off that you don't want)
(01:38:39 PM) Keybuk: <Ireyon> doesn't bootchart slow down the boot process, too? :D
(01:38:40 PM) Keybuk: yes.
(01:38:45 PM) Keybuk: but that's ok
(01:38:58 PM) Keybuk: bootchart is for when you're looking at what your boot does
(01:39:00 PM) Keybuk: and you make some changes
(01:39:02 PM) Keybuk: and compare the before and after
(01:39:18 PM) Keybuk: it's not something you'd leave installed all the time
(01:39:32 PM) Keybuk: so you have bootchart
(01:39:37 PM) Keybuk: you also need a machine to run it on
(01:39:55 PM) Keybuk: comparing bootcharts produced from two different installations is rarely  useful
(01:40:03 PM) Keybuk: comparing bootcharts produced from two different machines is almost never useful
(01:40:25 PM) Keybuk: we're using a standard machine for our work
(01:40:33 PM) Keybuk: which means we can compare bootcharts between ourselves
(01:40:39 PM) Keybuk: the machine we picked was a Dell Mini 9
(01:40:51 PM) Keybuk: it's got a few features we wanted
(01:41:01 PM) Keybuk: it has the slower Intel Atom processor, which means that things tend to show up better
(01:41:06 PM) Keybuk: and it has an SSD disk
(01:41:14 PM) Keybuk: also boot speed is a hot topic in the netbook space in general
(01:41:22 PM) Keybuk: and most importantly, it's a standard off-the-shelf piece of equipment
(01:41:32 PM) Keybuk: anyone who has one is pretty much guaranteed to have an identical piece of kit to everyone else
(01:42:00 PM) Keybuk: so for the last bit, I'd like to talk a bit about how we're doing
(01:42:02 PM) Keybuk: and what's next ;)
(01:42:09 PM) Keybuk: so I have some boot charts to show you
(01:42:14 PM) Keybuk:
(01:42:28 PM) Keybuk: this is from the Mini 9 with the factory-installed UNR image
(01:42:40 PM) Keybuk: the guys in Lexington did some amazing work, the whole desktop comes up in 36s
(01:42:51 PM) Keybuk: (or even 35s
(01:43:22 PM) Keybuk: so how does Intrepid compare?
(01:43:25 PM) Keybuk:
(01:43:28 PM) Keybuk: Not well.
(01:43:37 PM) Keybuk: a default Intrepid install is 71s!
(01:43:41 PM) Keybuk: twice as long
(01:43:51 PM) Keybuk: this is because the UNR image is extremely customised for its hardware
(01:43:51 PM) apachelogger is now known as apachelogger_
(01:44:01 PM) Keybuk: whereas the Intrepid image is generic, and portable to any Intel hardware
(01:44:10 PM) Keybuk: and, most importantly, any user configuration
(01:44:33 PM) Keybuk: but we've been working on that
(01:44:40 PM) Keybuk: here's one for Jaunty with some of the improvements we've made
(01:44:40 PM) Keybuk:
(01:44:55 PM) Keybuk: down to 53s
(01:45:35 PM) Keybuk: and that's still a generic image
(01:45:43 PM) Keybuk: you should be able to see a similar improvement on any machine
(01:45:53 PM) Keybuk: there's lots of interesting things to read off these charts
(01:45:57 PM) Keybuk: I'll cover the hilights
(01:46:09 PM) Keybuk: the top graph is useful
(01:46:13 PM) Keybuk: it tells us how much of the CPU we're using
(01:46:19 PM) Keybuk: (or at least how much time we're not in userspace)
(01:46:27 PM) Keybuk: we rarely max out the CPU during boot
(01:46:45 PM) Keybuk: so paradoxically, computing data on boot is faster than reading a cache off disk - assuming you don't need the disk for computation
(01:46:59 PM) Keybuk: that's why compression can help
(01:47:02 PM) Keybuk: but note that it's not as if the CPU is idle
(01:47:08 PM) Keybuk: so just compressing the entire disk would be a net loss
(01:47:14 PM) Keybuk: Red in that top graph is *BAD*
(01:47:17 PM) Keybuk: it means we're waiting for the disk
(01:47:26 PM) Keybuk: the second red graph is disk utilisation
(01:47:34 PM) Keybuk: there's some interesting spikes
(01:47:43 PM) Keybuk: the first one is the readahead process, where we read the data in from the disk that we think we use
(01:47:52 PM) Keybuk: it goes all spiky after X starts because we don't read that stuff in yet
(01:48:12 PM) Keybuk: there's a cute spike around 13s - no idea what that one is
(01:48:28 PM) Keybuk: (the processes in the chart go red if they're using disk too)
(01:48:42 PM) Keybuk: the larger one around 18s seems to be the rc script, that's got to be a bug
(01:48:55 PM) Keybuk: and the big spike at 24s is syslog starting up
(01:49:14 PM) Keybuk: <StyXman> QUESTION: why thre's this big gap of no CPU or disk activity around the 45s in the last chart?
(01:49:17 PM) Keybuk: a VERY good question ;-)
(01:49:27 PM) Keybuk: in fact, as soon as I looked at this graph, I asked the very same one
(01:49:36 PM) Keybuk: dead space in the graph means the system is idle
(01:49:43 PM) Keybuk: I have a theory
(01:49:51 PM) Keybuk: scroll down and look what's happening around then
(01:49:55 PM) Keybuk: we're starting gnome panel applets
(01:50:08 PM) Keybuk: the panel looks like it's started around 40-43s in
(01:50:15 PM) Keybuk: and there's a whole bunch of applets that get started with it
(01:50:17 PM) Keybuk: then there's a pause
(01:50:22 PM) Keybuk: and a SECOND round of applets get started
(01:50:32 PM) Keybuk: I think the session manager has a sleep(5) in it <g>
(01:50:41 PM) Keybuk: I think it starts one set, sleeps for 5s and starts the second set
(01:50:55 PM) Keybuk: this is exactly what I mean about gross and obvious bugs
(01:51:40 PM) Keybuk: there's some other bugs in here too
(01:51:53 PM) Keybuk: see the sleep around 14s in, and the other one around 20s in?
(01:52:01 PM) Keybuk: every time someone calls sleep during boot, kittens die
(01:52:10 PM) Keybuk: there's still some obvious hogs of processes
(01:52:14 PM) Keybuk: udev (well modprobe really)
(01:52:15 PM) Keybuk: X
(01:52:16 PM) Keybuk: compiz
(01:52:19 PM) Keybuk: nautilus
(01:52:22 PM) Keybuk: the gnome-panel
(01:52:31 PM) Keybuk: these are doing an extraordinary amount of work
(01:52:47 PM) Keybuk: there's a bizarre logsave call around 14s in as well
(01:52:51 PM) Keybuk: no idea why
(01:52:56 PM) Keybuk: and some other bugs
(01:53:00 PM) Keybuk: trackerd gets started, but is disabled
(01:53:05 PM) Keybuk: so why is it doing so much IO ?
(01:53:20 PM) Keybuk: and the bluetooth applet and jockey-gtk seem very expensive for their size
(01:53:33 PM) Keybuk: <loic-m> QUESTION: are you also loking on suspend-to-disk/resume speed (and can bootchart be used for that purpose)?
(01:53:47 PM) Keybuk: loic-m: personally, no;  suspend and resume speed is almost certainly disk bound
(01:54:06 PM) Keybuk: hard to use bootchart for that due to the way it works
(01:54:10 PM) Keybuk: you'd need something in-kernel
(01:54:20 PM) Keybuk: <Ireyon> QUESTON: Does the system use multiple cores to boot if available? Wouldn't it be possible to load some less important services while GDM/Gnome starts up? (GDM is horribly slow in intrepid)
(01:54:29 PM) Keybuk: Ireyon: we always use multiple cores
(01:54:40 PM) Keybuk: however you'll note that we're really not maxing out a single low-powered CPU here
(01:54:43 PM) Keybuk: so it won't help any
(01:54:58 PM) Keybuk: with autologin, GDM doesn't seem to take any time to start up
(01:55:00 PM) Keybuk: the server does
(01:55:16 PM) Keybuk: and loading services alongside will just slow it down more
(01:55:37 PM) Keybuk: <oliver_g_> btw. is the Dell Mini 9 have a dualcore system?
(01:55:41 PM) Keybuk: oliver_g_: it's a dual-core Intel Atom iirc
(01:56:58 PM) Keybuk: so any other questions? :-)
(01:57:32 PM) tethridge: So what is the boot goal for 9.04?
(01:57:50 PM) Keybuk: we don't have a specific goal at this point, we're just cutting the crap and fixing bugs for now
(01:58:12 PM) tethridge: or what do you see as realistic considering the amount of time you have left to work out low hanging fruit
(01:58:14 PM) Keybuk: it looks like a full desktop is attainable in around 30s
(01:58:25 PM) Keybuk: but I think for jaunty, on this platform, 45s is more likely
(01:59:05 PM) Keybuk: <StyXman> QUESTION: can readahead be improved in this regard? ir is the disk activity after it already finished all write calls?
(01:59:20 PM) Keybuk: StyXman: the prime improvement for readahead will be building the list of blocks to read
(01:59:30 PM) Keybuk: right now, it's generated for each CD, and gets increasingly out of date
(01:59:40 PM) Keybuk: <GSMX> QUESTION: Can't boot time be less than 2 seconds? I mean, with all this computing power nowadays, boottime was faster on win95!
(01:59:48 PM) Keybuk: GSMX: boot time is nothing to do with computing power, see above ;-)
(01:59:54 PM) tethridge: is upstart now fully optimized for boot speed?  I know that in past releases we were making a gradual shift to upstart.
(01:59:58 PM) Keybuk: <StyXman> QUESTION: if they're write calls, can these be cached and flushed after we finished booting? I've seen my laptop not writing anything to disk while on batteries
(02:00:05 PM) Keybuk: StyXman: we shouldn't really write anything during boot ;-)
(02:00:13 PM) Keybuk: tethridge: upstart is unrelated to boot speed
(02:00:54 PM) Keybuk: ok, I'm out of time now ;)
(02:01:04 PM) Keybuk: but do feel free to grab me at any time

MeetingLogs/devweek0901/BootPerform (last edited 2009-01-22 21:14:30 by pool-71-182-96-163)