IntegratedDesktopSearch

Differences between revisions 27 and 28
Revision 27 as of 2006-11-22 13:41:00
Size: 10794
Editor: cm177
Comment:
Revision 28 as of 2006-11-22 13:52:35
Size: 11270
Editor: cm177
Comment:
Deletions are marked like this. Additions are marked like this.
Line 76: Line 76:
          * '''Fabricecolin''': ''Integration'' is cool, I just wish each project didn't have to reinvent the wheel. As far as I know, the Tracker guys had more or less to replicate the work the Beagle guys did to plug into the filechooser and Nautilus. If there was a common (dbus ?) interface defined for search services and a way to advertise such a new service to consumer apps, that would be great. Is this something that would fall under the freedesktop.org umbrella ?

Summary

Searching should be a natural part of the desktop. No matter what you are looking for, a search option should be readily available.

Rationale

In many applications (e.g. nautilus, deskbar-applet, ...) there is already some support for searching, but these systems are rather slow and they use the old locate and find tools.

Currently there are two main sollutions available: beagle and Tracker. Both are already integrated in a number of applications, it would be good to enable them to use one of those as backend and integrated them in some more applications.

While beagle seems to see more acceptance right now, there are some pro and cons, that seem to sugest Tracker is the better choice:

  • Beagle is developed by Novell which means it will have plenty of coorporate support
  • Beagle is written in C#, which makes for a rather large dependency (storage wise)
  • Beagle is too processor intensive compared to Tracker (VM vs. native C)
  • Tracker is written in C and communicates with other apps over a DBUS interface
  • Tracker is extremely lightweight, and might even be suitable for embedded systems
  • Tracker uses mature technologies such as SQLite and QDBM (mysql is optional)
  • Tracker is able to store file metadata on demand
  • Tracker has an integrated keyword/tagging functionality for all first class objects
  • Tracker is lightning fast
  • Tracker implements the FDO Metadata Spec

(This rationale isn't written from an objective point-of-view, see comments below)

Use cases

Scope

Generally this should be fairly easy to implement, as it is just a matter of packaging Tracker and recompiling some packages to use Tracker.

If time permits it may also be possible to integrate Tracker in some more applications (especially those who integrate beagle already). Possible points of integration:

  • Nautilus search (Konqueror?)
  • Filechooser
  • Rhythmbox/F-Spot/Banshee/Totem (could use tracker as the library/archive for media content)
  • Panel applet (there's already a Deskbar plugin)
  • Web browser (index page content)
  • General search interface capable of searching all available contents
  • Tagging of anything, files, music, images, emails, news
  • Others?

Design

Implementation

basically the most biggest Part has already been done. Nautilus already lets you choose tracker at compile time and there are already according packages avaible: http://www.gnome.org/~jamiemcc/tracker/DEB/ These only need to be included in main.

For further integration in nautilus like the tagging support, the nautilus-python package has to move in main and the following plugin has to be installed by nautilus (goes in ~/.nautilus/python-extensions/): http://cvs.gnome.org/viewcvs/*checkout*/tracker/python/nautilus/tracker-tags-tab.py?rev=1.1

Code

Data preservation and migration

Outstanding issues

BoF agenda and discussion

Comments

  • JoeShaw : The rationale above is not written from a neutral or objective point-of-view. Although it mentions "pros and cons", it lists none of the pros of Beagle or any cons of Tracker. Some (like the CPU usage assertion) are either untrue or impossible to objectively measure. And the implication that Beagle is based on immature technology is patently false. (Beagle has used SQLite for over two years, and it uses the Lucene text indexing, which has been around for several years on Java and roughly 3 years on .Net.) Although I believe Beagle to be the stronger technology at this point personally, it is important to have objective analysis before making a decision.

  • KillerKiwi : I think this is all covered by Beagle / Nautilus Search and Deskbar in Dapper

    • MikkelKamstrupErlandsen : Nautilus is not linked against Beagle, so it does not utilize the searching capabilties of it. Also beagle is not integrated into the filechooser (as I think Novell is doing), or in the web browser for that matter.

      • JoeShaw : Nautilus can be linked against Beagle, it's a Nautilus configure-time flag. In fact, the Nautilus search infrastructure was developed for Beagle, by Beagle developers. Also, the filechooser patch has been tweaked by the GTK+ maintainers and will probably soon go upstream.

        The keyword is integration.

        • MikkelKamstrupErlandsen: Integration - totally agree on that. I know how Beagle/Tracker can link to Nautilus. What I meant was that the current Ubuntu Nautilus is not linked against Beagle. I saw that Mathias Clasen commited a Beagle/Tracker patch on Fedoara rawhide recently.

        • I also agree that the above list is quite biased. It kinda lists Beagle vs Tracker, and I think that is a bad idea - especially with the current low factual level regarding the technical details.
          • Fabricecolin: Integration is cool, I just wish each project didn't have to reinvent the wheel. As far as I know, the Tracker guys had more or less to replicate the work the Beagle guys did to plug into the filechooser and Nautilus. If there was a common (dbus ?) interface defined for search services and a way to advertise such a new service to consumer apps, that would be great. Is this something that would fall under the freedesktop.org umbrella ?

  • MikkelKamstrupErlandsen : The recent developments on Tracker shows that it is maturing really fast. I have been using 0.0.4 on several boxes for a good while, and I'd say that it mathes (or outperforms) Beagle on both stability, memory and speed. The only setback is the number of "backends", where Tracker still misses things as news, emails and tomboy notes - they will arrive eventually though. The very energetic move would be for Ubuntu/Canonical to sponsor Jamie to work on Tracker. This would be a move that would position Ubuntu/Canonical has bleeding edge technology contributors to the free desktop along side Novell and Red Hat.

  • FryerFox : Does Tracker allow you to create more complex searches such as (type: pdf) and (dir: /reference/rfc) and (tcp near ip)? Beagle is an excellent application, but a major problem I have with Desktop Search is the lack of complex searching facilities. This makes it quite unsuitable for a file search which you might want to be restricted to a single directory or only deal with the file name and not the contents, etc.

    • MikkelKamstrupErlandsen : Tracker allows for rdf queries, so - yes, it allows for far more flexible searches than Beagle. You can for example search for all jpegs with sizes between 800x600 and 1024x720 that was created before 2004.

      • FryerFox : Well, that for me seems to be the most important criteria for a long-term searching solution. If the search solution is not extremely versatile, then it will (obviously) have limited application and can't be used as the founding infrastructure for an integrated system searching facility. I think we should keep in mind possible uses of a search engine, rather than common current uses. For example, in GConf, return me a key that matches var[\_]*[\d]+ and is of boolean type, or in Epipheny, find me the web pages I visited in the last week that contained a reference to a PDF and the words 'superposition principle' but not 'blind decomposition'.

        • MikkelKamstrupErlandsen : Well, Tracker can do all of that. Tracker is not a search engine as such. You should think of it as metadata storage and indexing. You can get, set, and search arbitrary metadata on any first class object (emails, documents, conversations etc.).

    • JoeShaw : You can do many of these searches with Beagle as well. You can search for PDFs by typing in "ext:pdf". Slop and proximity searches are something we'll probably be adding soon (as Lucene supports those natively). Containing directories are a little trickier, but we'll hopefully have support for those soon as well. Also, you can build arbitrarily complex searches with Beagle programmatically with its API; a simple text entry can't expose all the features of the underlying engine. (Date range searches, for example, can be run.)

      • MikkelKamstrupErlandsen: Yeah, Lucene makes Beagle really really strong, that is inaguable. While sloppy and proximity searches are a good while away in Tracker (atleast with my limited understanding), the ability to store and query metadata with RDF is a great plus on Trackers side... See fx. the [http://www.grillbar.org/wordpress/?p=173 tech-demo/toy note app I created called Daze] using the metadata handling of Tracker. I must admit I don't know if similar stuff is possible with Beagle..?

    • Fabricecolin : This type of query is supported by Pinot (more accurately, by Xapian's [http://www.xapian.org/docs/queryparser.html QueryParser]). It would be something like "type:application/pdf dir:/reference/rfc tcp near ip". The syntax is close to that supported by some Web search engines.

  • JackWasey The problem is really that there is no consistent rich meta-data. Each application and desktop environment has its own method, usually involving vast numbers of .hidden folders. Using something like user_xattr or ["Reiser4"] extensively would vastly improve search, and desktop usability. Imagine being able to tag any file or folder like [http://del.icio.us]... A good desktop system would automatically tag things it knew about, e.g. mp3 -> music, but you could add your own. This would beat the useless forced categorisation of mp3 genre, and id3 tags; and of course there are powerful use cases in other areas, especially of multimedia.

    • MikkelKamstrupErlandsen : Extended attributes and Reiser4 is not a really portable way of storing metadata - it's kinda Linux only. IMHO it is also better to use user space tools for such tasks than kernel space. Also Tracker is designed to do exactly this - and frankly, I'd trust Tracker over Resier4 when it comes to stability. Why would there need to be a music tag on mp3 files when you can search by mime-types? Or simply use the build in Music service of Tracker? I know that there is a Rhythbox patch for Tracker tagging in the melting pot, it is asimple matter for other apps to follow suit...

  • Saads : How about GLScube - a semantic storage system. Check out the videos it looks very promising. http://www.glscube.org/

  • ih : The Semantic Indexing Project also seems very promising [http://www.knowledgesearch.org/]. Also see [http://software.newsforge.com/article.pl?sid=06/09/19/1531258]

    • MikkelKamstrupErlandsen : Does the SIP have desktop serach facilities... They talk about distributed indexing and such...

  • MikkelKamstrupErlandsen : There is also [http://pinot.berlios.de/ Pinot] comming up. It has a dbus interface and deskbar integration already. Haven't really tested it though.


CategorySpec

IntegratedDesktopSearch (last edited 2008-08-06 16:37:54 by localhost)