Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.
Launchpad Entry: prefetch
Packages affected: kernel, readahead, prefetch
One of our 2007 Summer of Code projects was the development of kernel patches and other necessary parts to prefetch files and speed up the boot sequence. Evaluate and plan to integrate.
Booting the computer and starting applications now happens faster, due to the new "prefetch" project. This profiles which files are read in which order and automatically optimizes the order the data on the hard disk.
Our current solution, readahead, has some problems which the new prefetch solves:
- Boot order needs to be maintained and updated manually. It is prone to be forgotten by the developers at release crunch time.
- Requires updating the CDs, since it has a static profile.
- Boot time gets worse over time when the disk layout changes, since the profile does not update itself.
History and Features
The problem of hard disk reading optimization by was tackled by the following projects so far:
- introduced into Hoary in 2004, Gutsy still has the very same version
- only solution that has ever been in Ubuntu
- no automatic profiling
- profiling steps: inotify the entire file system as very first step
- in initramfs, boot and record which files are read (and their order)
- profile is saved, put it into the package, and uploaded. This needs to happen for at least the beta, RC, and final releases and we must not forget about it.
- high RAM usage
- Google SoC 2005
- Daemon, wakes up every 20 seconds and bursts reads which are currently in progress
- pure user-space solution
- no profiling necessary
- inefficient, since much of the data has likely being read already in the time between wake-ups
- Google SoC 2006
- introduces sysctl "open by inode number" to speed up file access
- above point bypasses file permissions and thus is a security risk
- no automatic profiling
- shuffling to move things to the disk front, which turned out to be non-optimal, and the implementation was bad, too
Google SoC 2007 (AutomaticBootAndApplicationPrefetchingSpec)
- Consists of a kernel patch for automatically profiling boot and application startup, and a userspace daemon "prefetch-process-trace" which acquires the kernel data and dynamically updates profiles.
- This speeds up things after booting, too.
- Purely dynamic profiling.
- Primary means of acceleration is a readahead-like component (reading required disk blocks in advance while disk I/O is not needed).
Provides an experimental ext2/3 disk reordering tool (e2remapblocks) to optimize layout for boot, for getting some more speedup. This is transactional and can be safely interrupted at any time. When run with ionice, it is appropriate for a cron job. However, upstream currently does not recommend to roll it out by default since it has not been tested thoroughly enough yet and might still destroy your fs.
- Tests on a loaded Kubuntu test machine:
- 1.6s faster startup for Firefox
- 3 seconds faster startup for OO.o
- Boot time drops from 52 to 46 seconds
- Usually faster than with readahead, seldomly a bit slower.
- Cannot optimize the live system startup time (no solution provides that so far).
ColinWatson: Since it seems that prefetch would provide almost all of this (all we really need is a roughly sorted list of files to feed to mksquashfs; the rest of the code was written some time ago), if possible I would like one of the outputs of this specification to be a method of generating a sorted list of files that can be fed to other tools. (If that turns out not to be possible with prefetch, that's fine, but it seems straightforward to try.)
- Krzysztof Lichota (upstream)'s response:
In generic case it is possible but not efficient. Prefetch stores inode numbers, not paths. You can generate reverse mapping from inodes to paths, but it requires scanning the whole directory tree. There is a tool in our sources for this - I have done this for some tests. But if file is unlinked between recording accesses and generating reverse mapping, it might return wrong path (for some other file). As far as I undestand you want to use it for generating SquashFS image, so it should not be a problem as files on CD-ROM do not change :) A bit of warning - the prefetch patch has been tested only on ext3. I have got information it causes oops on xfs, so a lot of testing is needed on various filesystems before it is deployed. I will try to fix this problem and some other things next month, but I would really use some help from kernel developers and testing from users.
- Steve is Ubuntu's release manager. He is happy that he can drop the "update readahead profile" from the release checklist and does not need to update CDs for it any more.
- Joe installs Ubuntu 8.04 and installs a few applications. The
next time he reboots, Ubuntu and OpenOffice start much faster.
- The kernel patch has been proposed upstream, but there has not been a response yet. It is relatively unintrusive, though, it is just a standalone kernel module which goes into linux-ubuntu-modules and the initramfs. Get it in early in Hardy for maximum test coverage.
- Review existing packaging of the user-space tool and bring it into
Hardy's universe for more widespread testing. (Code is on https://launchpad.net/prefetch, documentation is on http://code.google.com/p/prefetch/). Due to its experimental state it will not go into main for Hardy, though.
- Drop readahead and the release checklist items.
Details about the implementation itself are on AutomaticBootAndApplicationPrefetchingSpec.
(TODO when beta is available)