IntegratedDesktopSearch

Revision 21 as of 2006-10-06 16:02:52

Clear message

Summary

Searching should be a natural part of the desktop. No matter what you are looking for, a search option should be readily available.

Rationale

In many applications (e.g. nautilus, deskbar-applet, ...) there is already some support for searching, but these systems are rather slow and they use the old locate and find tools.

Currently there are two main sollutions available: beagle and Tracker. Both are already integrated in a number of applications, it would be good to enable them to use one of those as backend and integrated them in some more applications.

While beagle seems to see more acceptance right now, there are some pro and cons, that seem to sugest Tracker is the better choice:

  • Beagle is developed by Novell which means it will have plenty of cooporate support
  • Beagle is written in C#, which makes for a rather large dependancy (storage wise)
  • Beagle is too processor intensive compared to Tracker (VM vs. native C)
  • Tracker is written in C and communicates with other apps over a DBUS interface
  • Tracker is extremely lightweight, and might even be suitable for embedded systems
  • Tracker uses mature technologies such as SQLite (mysql is optional) and libextractor instead of in-development technologies
  • Tracker is able to store file metadata on demand
  • Tracker has an integrated keyword/tagging functionality for all first class objects
  • Tracker is lightning fast
  • Tracker implements the FDO Metadata Spec

Use cases

Scope

Generally this should be fairly easy to implement, as it is just a matter of packaging Tracker and recompiling some packages to use Tracker.

If time permits it may also be possible to integrate Tracker in some more applications (especially those who integrate beagle already). Possible points of integration:

  • Nautilus search (Konqueror?)
  • Filechooser
  • Rhythmbox/F-Spot/Banshee/Totem (could use tracker as the library/archive for media content)
  • Panel applet (there's already a Deskbar plugin)
  • Web browser (index page content)
  • General search interface capable of searching all available contents
  • Tagging of anything, files, music, images, emails, news
  • Others?

Design

Implementation

Code

Data preservation and migration

Outstanding issues

BoF agenda and discussion

Comments

  • KillerKiwi : I think this is all covered by Beagle / Nautilus Search and Deskbar in Dapper

    • MikkelKamstrupErlandsen : Nautilus is not linked against Beagle, so it does not utilize the searching capabilties of it. Also beagle is not integrated into the filechooser (as I think Novell is doing), or in the web browser for that matter. The keyword is integration.

  • MikkelKamstrupErlandsen : The recent developments on Tracker shows that it is maturing really fast. I have been using 0.0.4 on several boxes for a good while, and I'd say that it mathes (or outperforms) Beagle on both stability, memory and speed. The only setback is the number of "backends", where Tracker still misses things as news, emails and tomboy notes - they will arrive eventually though. The very energetic move would be for Ubuntu/Canonical to sponsor Jamie to work on Tracker. This would be a move that would position Ubuntu/Canonical has bleeding edge technology contributors to the free desktop along side Novell and Red Hat.

  • FryerFox : Does Tracker allow you to create more complex searches such as (type: pdf) and (dir: /reference/rfc) and (tcp near ip)? Beagle is an excellent application, but a major problem I have with Desktop Search is the lack of complex searching facilities. This makes it quite unsuitable for a file search which you might want to be restricted to a single directory or only deal with the file name and not the contents, etc.

    • MikkelKamstrupErlandsen : Tracker allows for rdf queries, so - yes, it allows for far more flexible searches than Beagle. You can for example search for all jpegs with sizes between 800x600 and 1024x720 that was created before 2004.

      • FryerFox : Well, that for me seems to be the most important criteria for a long-term searching solution. If the search solution is not extremely versatile, then it will (obviously) have limited application and can't be used as the founding infrastructure for an integrated system searching facility. I think we should keep in mind possible uses of a search engine, rather than common current uses. For example, in GConf, return me a key that matches var[\_]*[\d]+ and is of boolean type, or in Epipheny, find me the web pages I visited in the last week that contained a reference to a PDF and the words 'superposition principle' but not 'blind decomposition'.

        • MikkelKamstrupErlandsen : Well, Tracker can do all of that. Tracker is not a search engine as such. You should think of it as metadata storage and indexing. You can get, set, and search arbitrary metadata on any first class object (emails, documents, conversations etc.).

  • JackWasey The problem is really that there is no consistent rich meta-data. Each application and desktop environment has its own method, usually involving vast numbers of .hidden folders. Using something like user_xattr or ["Reiser4"] extensively would vastly improve search, and desktop usability. Imagine being able to tag any file or folder like [http://del.icio.us]... A good desktop system would automatically tag things it knew about, e.g. mp3 -> music, but you could add your own. This would beat the useless forced categorisation of mp3 genre, and id3 tags; and of course there are powerful use cases in other areas, especially of multimedia.

    • MikkelKamstrupErlandsen : Extended attributes and Reiser4 is not a really portable way of storing metadata - it's kinda Linux only. IMHO it is also better to use user space tools for such tasks than kernel space. Also Tracker is designed to do exactly this - and frankly, I'd trust Tracker over Resier4 when it comes to stability. Why would there need to be a music tag on mp3 files when you can search by mime-types? Or simply use the build in Music service of Tracker? I know that there is a Rhythbox patch for Tracker tagging in the melting pot, it is asimple matter for other apps to follow suit...

  • Saads : How about GLScube - a semantic storage system. Check out the videos it looks very promising. http://www.glscube.org/

  • ih : The Semantic Indexing Project also seems very promising [http://www.knowledgesearch.org/]. Also see [http://software.newsforge.com/article.pl?sid=06/09/19/1531258]

    • MikkelKamstrupErlandsen : Does the SIP have desktop serach facilities... They talk about distributed indexing and such...

  • MikkelKamstrupErlandsen : There is also [http://pinot.berlios.de/ Pinot] comming up. It has a dbus interface and deskbar integration already. Haven't really tested it though.


CategorySpec