Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/integrated-desktop-search
Created: 2006-04-200 by MikkelKamstrupErlandsen
Packages affected: nautilus, deskbar-applet, libgtk-2.0 (filechooser)
Searching should be a natural part of the desktop. No matter what you are looking for, a search option should be readily available.
In many applications (e.g. nautilus, deskbar-applet, ...) there is already some support for searching, but these systems are rather slow and they use the old locate and find tools.
Currently there are two main sollutions available: beagle and Tracker. Both are already integrated in a number of applications, it would be good to enable them to use one of those as backend and integrated them in some more applications.
While beagle seems to see more acceptance right now, there are some pro and cons, that seem to sugest Tracker is the better choice:
- Beagle is developed by Novell which means it will have plenty of coorporate support
- Beagle is written in C#, which makes for a rather large dependency (storage wise)
- Beagle is too processor intensive compared to Tracker (VM vs. native C)
- Tracker is written in C and communicates with other apps over a DBUS interface
- Tracker is extremely lightweight, and might even be suitable for embedded systems
- Tracker uses mature technologies such as SQLite and QDBM (mysql is optional)
- Tracker is able to store file metadata on demand
- Tracker has an integrated keyword/tagging functionality for all first class objects
- Tracker is lightning fast
- Tracker implements the FDO Metadata Spec
(This rationale isn't written from an objective point-of-view, see comments below)
Generally this should be fairly easy to implement, as it is just a matter of packaging Tracker and recompiling some packages to use Tracker.
If time permits it may also be possible to integrate Tracker in some more applications (especially those who integrate beagle already). Possible points of integration:
- Nautilus search (Konqueror?)
- Rhythmbox/F-Spot/Banshee/Totem (could use tracker as the library/archive for media content)
- Panel applet (there's already a Deskbar plugin)
- Web browser (index page content)
- General search interface capable of searching all available contents
- Tagging of anything, files, music, images, emails, news
A great implementation point would be Evolution, currently, Evolution uses its own mail search system, which is fine when restricted to specific metadata (like From: or Subject: ) but when trying to search message bodies, Evolution can be very slow, especially when dealing with large folders. A better solution might be to query Beagle or Tracker for this information if either is present, Evolution's E-Plugin system should make this a feasible task.
basically the most biggest Part has already been done. Nautilus already lets you choose tracker at compile time and there are already according packages avaible: http://www.gnome.org/~jamiemcc/tracker/DEB/ These only need to be included in main.
For further integration in nautilus like the tagging support, the nautilus-python package has to move in main and the following plugin has to be installed by nautilus (goes in ~/.nautilus/python-extensions/): http://cvs.gnome.org/viewcvs/*checkout*/tracker/python/nautilus/tracker-tags-tab.py?rev=1.1
The code for GtkFileChooser integration has been attached to this bug https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bug/49608
Data preservation and migration
BoF agenda and discussion
MikkelKamstrupErlandsen : Work has started to create a unified spec and dbus api for desktop indexers (an metadata storage layers in the future). The discussions are going on at xdg list on FDO. There's a draft spec for a simple search api online.
JoeShaw : The rationale above is not written from a neutral or objective point-of-view. Although it mentions "pros and cons", it lists none of the pros of Beagle or any cons of Tracker. Some (like the CPU usage assertion) are either untrue or impossible to objectively measure. And the implication that Beagle is based on immature technology is patently false. (Beagle has used SQLite for over two years, and it uses the Lucene text indexing, which has been around for several years on Java and roughly 3 years on .Net.) Although I believe Beagle to be the stronger technology at this point personally, it is important to have objective analysis before making a decision.
KillerKiwi : I think this is all covered by Beagle / Nautilus Search and Deskbar in Dapper
MikkelKamstrupErlandsen : Nautilus is not linked against Beagle, so it does not utilize the searching capabilties of it. Also beagle is not integrated into the filechooser (as I think Novell is doing), or in the web browser for that matter.
JoeShaw : Nautilus can be linked against Beagle, it's a Nautilus configure-time flag. In fact, the Nautilus search infrastructure was developed for Beagle, by Beagle developers. Also, the filechooser patch has been tweaked by the GTK+ maintainers and will probably soon go upstream.
The keyword is integration. Check out This Bug for more details
MikkelKamstrupErlandsen: Integration - totally agree on that. I know how Beagle/Tracker can link to Nautilus. What I meant was that the current Ubuntu Nautilus is not linked against Beagle. I saw that Mathias Clasen commited a Beagle/Tracker patch on Fedoara rawhide recently.
- I also agree that the above list is quite biased. It kinda lists Beagle vs Tracker, and I think that is a bad idea - especially with the current low factual level regarding the technical details.
Fabricecolin: Integration is cool, I just wish each project didn't have to reinvent the wheel. As far as I know, the Tracker guys had more or less to replicate the work the Beagle guys did to plug into the filechooser and Nautilus. If there was a common (dbus ?) interface defined for search services and a way to advertise such a new service to consumer apps, that would be great. Is this something that would fall under the freedesktop.org umbrella ?
MikkelKamstrupErlandsen: Fabrice, see my reply above
Fabricecolin: nice one !
MikkelKamstrupErlandsen : The recent developments on Tracker shows that it is maturing really fast. I have been using 0.0.4 on several boxes for a good while, and I'd say that it mathes (or outperforms) Beagle on both stability, memory and speed. The only setback is the number of "backends", where Tracker still misses things as news, emails and tomboy notes - they will arrive eventually though. The very energetic move would be for Ubuntu/Canonical to sponsor Jamie to work on Tracker. This would be a move that would position Ubuntu/Canonical has bleeding edge technology contributors to the free desktop along side Novell and Red Hat.
FryerFox : Does Tracker allow you to create more complex searches such as (type: pdf) and (dir: /reference/rfc) and (tcp near ip)? Beagle is an excellent application, but a major problem I have with Desktop Search is the lack of complex searching facilities. This makes it quite unsuitable for a file search which you might want to be restricted to a single directory or only deal with the file name and not the contents, etc.
MikkelKamstrupErlandsen : Tracker allows for rdf queries, so - yes, it allows for far more flexible searches than Beagle. You can for example search for all jpegs with sizes between 800x600 and 1024x720 that was created before 2004.
FryerFox : Well, that for me seems to be the most important criteria for a long-term searching solution. If the search solution is not extremely versatile, then it will (obviously) have limited application and can't be used as the founding infrastructure for an integrated system searching facility. I think we should keep in mind possible uses of a search engine, rather than common current uses. For example, in GConf, return me a key that matches var[\_]*[\d]+ and is of boolean type, or in Epipheny, find me the web pages I visited in the last week that contained a reference to a PDF and the words 'superposition principle' but not 'blind decomposition'.
MikkelKamstrupErlandsen : Well, Tracker can do all of that. Tracker is not a search engine as such. You should think of it as metadata storage and indexing. You can get, set, and search arbitrary metadata on any first class object (emails, documents, conversations etc.).
JoeShaw : You can do many of these searches with Beagle as well. You can search for PDFs by typing in "ext:pdf". Slop and proximity searches are something we'll probably be adding soon (as Lucene supports those natively). Containing directories are a little trickier, but we'll hopefully have support for those soon as well. Also, you can build arbitrarily complex searches with Beagle programmatically with its API; a simple text entry can't expose all the features of the underlying engine. (Date range searches, for example, can be run.)
MikkelKamstrupErlandsen: Yeah, Lucene makes Beagle really really strong, that is inaguable. While sloppy and proximity searches are a good while away in Tracker (atleast with my limited understanding), the ability to store and query metadata with RDF is a great plus on Trackers side... See fx. the tech-demo/toy note app I created called Daze using the metadata handling of Tracker. I must admit I don't know if similar stuff is possible with Beagle..?
Fabricecolin : This type of query is supported by Pinot (more accurately, by Xapian's QueryParser). It would be something like "type:application/pdf dir:/reference/rfc tcp near ip". The syntax is close to that supported by some Web search engines.
JackWasey The problem is really that there is no consistent rich meta-data. Each application and desktop environment has its own method, usually involving vast numbers of .hidden folders. Using something like user_xattr or Reiser4 extensively would vastly improve search, and desktop usability. Imagine being able to tag any file or folder like http://del.icio.us... A good desktop system would automatically tag things it knew about, e.g. mp3 -> music, but you could add your own. This would beat the useless forced categorisation of mp3 genre, and id3 tags; and of course there are powerful use cases in other areas, especially of multimedia.
MikkelKamstrupErlandsen : Extended attributes and Reiser4 is not a really portable way of storing metadata - it's kinda Linux only. IMHO it is also better to use user space tools for such tasks than kernel space. Also Tracker is designed to do exactly this - and frankly, I'd trust Tracker over Resier4 when it comes to stability. Why would there need to be a music tag on mp3 files when you can search by mime-types? Or simply use the build in Music service of Tracker? I know that there is a Rhythbox patch for Tracker tagging in the melting pot, it is asimple matter for other apps to follow suit...
Saads : How about GLScube - a semantic storage system. Check out the videos it looks very promising. http://www.glscube.org/
ih : The Semantic Indexing Project also seems very promising http://www.knowledgesearch.org/. Also see http://software.newsforge.com/article.pl?sid=06/09/19/1531258
MikkelKamstrupErlandsen : Does the SIP have desktop serach facilities... They talk about distributed indexing and such...
KevinKubasik: While some people have mentioned a lot of really 'cool' programs, I think we need to look at what can provide for what most users are going to want to do. At the moment it seems that the closest contention would be between beagle and tracker. What most people seem to be missing about beagle is its vastly superior backends. For example, tracker just started to implement evolution indexing support, which is great, but the data isn't linked to a usable URI or identifier. Beagle not only indexes Mail, Contacts, Calendars, Chats, Notes, and WebHistory, but it also provides some meaningful or useful way of opening and accessing that data. Beagle isn't just a lot of cool ideas about to come together, but its functional, useful, and helpful.
- Extensability: Beagle has a highly pluggable archetecture that encourages the rapid development of additional Backends and Filters. I wrote a backend to query banshees database of music in about 30 minutes, and with beagle's external filters system, almost any file can be cracked open and examined.
- Performance: I know beagle has a bad rap on this front, however incredable progress has been made on slimming beagle down, and beagle is very intelligent about active indexing, it is even battery-aware on laptops.
- Built for Text: Beagle is based upon the extreamly proven Lucene indexing engine, its fast, and extreamly powerful. Most of all, it was designed to index great amounts of text and search that text quickly. Many of the other options being considered are using less tested or reliable systems. This also means that beagle scales incredably well. Searches are still under a second for thousands of e-mails and gigs of documents.
- Ease of Integration: In my personal opinion, this is the single greatest advantage beagle has when considering a backend for this spec. Beagle has a simple and powerful query api, almost deceptively so. Beagle sports powerful query parsing and metadata queries for everyone. Even a simple textbox with a minimalistic GUI can perform complex boolean queries, meaning that desktop integration is as simple as smooth filechooser and maybe mail integration. Not a complete implementation of an immensly complex query parser.
- My apologies for the ramble, didn't really mean to make it that long. But please feel free to ask or challenge anything I've said, I'm reasonable and want the best desktop we can get. I think that if people really have their doubts, try each for a week, and see which one you get the most use out of.
MikkelKamstrupErlandsen : There is an ongoing effort to create a unified dbus api for desktop search engines on the xdg mailing list at freedesktop.org (it's called the Wasabi project - look for the "simple search api" thread). I expect we have a first official draft soonish, and I also think this is the right long term approach. This way end user apps doesn't have to know anything about the underlying search engine...
AndreasHeinz : I have used Beagle for some time now, and will try Tracker from now on. What i don't like about Beagle is it's dependance on Mono/.NET. In my eyes this leads to a slow and memory (and yes i know/have seen Beagle's great improvement in this area) using application. I made the same experience with f-spot (also dependend on Mono) which is, of course for me, a lot slower than e.g. digikam. I can imagine that developing with Mono can make developing faster, but when this in the end means, that the application will use much more memory/much more cpu than an other application developed with c++/c, this makes my decision very easy to use the application developed in c++/c. I would be very interested to see the comparison of Michal Pryc and Steven Xusheng Hou of Sun done again with a recent release of beagle.
JoeShaw: "that the application will use much more memory/much more cpu than an other application developed with c++/c" This just isn't true. It's certainly not the case for CPU, and it isn't necessarily true about memory either, although you have to be a little more careful about memory usage with a managed environment (like Mono or Python), and until recently the tools for profiling with Mono were virtually nonexistent. It's true that in theory you simply cannot write a C# (or Python) program smaller and faster than a C one (although I don't agree with C++), in practice it's not not always the case. I'm not disputing that Tracker does use less memory than Beagle (although feature parity isn't there), but that your blanket statement is inaccurate.
AndreasHeinz: Thats's unfortunatly my experience. I can relativize this for CPU but not for the memory usage. And as i just said this is my experience and i don't wanted to sound blanket. Of course that depends on many many factors.