RLocate

RLocate

Status

Introduction

It would be nice to have an app that searches instantly your filesystem (like locate) and doesn't take insane amounts of time to update database, but monitors filesystem continuously instead.

Rationale

locate is a great commandline tool, but it lacks realtime database updates.

Scope and Use Cases

Alice wants to start KDE developing. She was told to look for foobar.h but she wasn't told in which directory this file resides. She has kio-locate installed: foobar.h is shown after typing locate:foo in konqueror's address bar.

Bob decides to rename the folder containing his photo collection. 10 minutes later he wants to show foo.jpg to his friend. locate:foo.jpg in konqueror shows foo.jpg.

Implementation Plan

rlocate (http://freshmeat.net/projects/rlocate/, http://rlocate.sourceforge.net/) is straightforward replacement, but needs own kernel module, which conflicts with NSA SElinux (enabled) and Default Linux Capabilities (module) (in Security Options).

How it is done internally

Quote from http://rlocate.sourceforge.net/:

rlocate.ko is kernel module, that can be compiled and insmoded to the kernel. It has hooks for link, mkdir, mknod, open, rename and symlink system calls with help of lsm. It stores filenames to the internal list. Since all the file names have slash in the beginning it is replaced either with 'a' or 'm' depending on if a file or directory is added or directory is moved in the filesystem. Reason for that is, that directories that are renamed must be traversed in the rlocate daemon later.

The module communicates with rlocate daemon with help of /dev/rlocate device file. It writes all the file names with one file name per line to this device file.

rlocated is a daemon that reads /dev/rlocate device file and writes the paths without leading '/' to the rlocate diff database. It does that every 2 seconds. If a directory was moved, it will be traversed and entries in the directory and its subdirectories are written to the diff database.

The locate command searches in rlocate database, which is the same as original slocate database, diff database and temporal diff database. The diff databases contain file names were added to the filesystem.

During the search also the file names are checked if they are accessible. If a file name was removed, it is not accessible, so it will be not shown.

The diff database is copied to temporal diff database and is emptied in the beginning of the updatedb and the temporal diff database is removed after updatedb is done. The reason for that is, so that also during updatedb is running, which can take several minutes, the locate command returns the correct results.

updatedb should be run daily e.g. as a cron job same as with slocate. Otherwise the searches would become slower with time.

updatedb runs in two modes - fast update or full update. Fast update takes 2 to 10 seconds to complete. Full update takes several minutes. The drawback of fast update is, that the database becomes more or less larger each day, since it contains also file names, that were removed. Fast update is run automatically every time, except the first time and every tenth time updatedb is started.

Outstanding Issues

How to make rlocate compatible with Ubuntu's kernel?

  1. Fix rlocate to allow security module stacking (should solve NSA SElinux conflict) and fix the conflict with Default Linux Capabilities, or
  2. Write a new ilocated daemon for rlocate which doesn't require a kernel module and a /dev/rlocate device to create the diff database. It should be possible to do this using inotify, though there are a few issues (TobiVollebregt performed some tests, http://home.casema.nl/vollebregt/soc/inotify-test.tar.bz2):

    • Memory usage. My entire "/" tree wo/ dictionary compression takes 20M virtual memory, w/ simple dictionary compression it takes 11M virtual memory. This seems pretty much to me for a daemon. This can be solved by not remembering the entire tree: an inotify event contains watch descriptor, allowing us to remove the watch if a dir is removed, and an inotify event contains filename, allowing us to update diff database without storing the files in a directory in the tree. Because the inotify event only includes the filename, not the entire path, it remains necessary to build a tree in memory (or do some fancy stuff to link WD to full path, possibly using a temporary file).
    • Number of watches. Because a watch is needed for every directory (it doesn't monitor recursively), this number gets quit high. Default max_user_watches was set to 8192. I needed 62127 watches to watch my box.
    • Kernel memory usage. I don't know how this is handled, and I didn't see any adverse effects of 62127 watches on my box, I doubt it's good practice to allocate ~1.7M of kernel memory (62127 watches times at least 28 bytes per watch)

Packages Affected

Conflicts: slocate

User Interface Requirements

Commandline interface identical to locate. A frontend for KDE is already available.

RLocate (last edited 2008-08-06 16:41:26 by localhost)