A new semantic search engine for the KDE desktop
Loaded for Bear
Baloo replaces Nepomuk as the semantic search engine on the KDE desktop, but it gets off to a bumpy start.
The Nepomuk semantic search has been highly controversial since its introduction in KDE – both for users and for application developers. The release of KDE Applications 4.13 brings a new solution named Baloo to the KDE desktop – but not without some growing pains.
Desktop environments are pretty dumb, as users discover when they look for a file whose name they simply cannot remember. Semantic desktops were designed to address this problem. A semantic desktop has access to information about data, such as what data manifested, when data was created, and how data relate to each other. Using this information, a user can, for example, search for a file on the hard disk that John Doe emailed in March.
The KDE developers wanted to give their desktop environment an appropriate semantic search engine. They turned to the results of the Nepomuk (Networked Environment for Personalized, Ontology-Based Management of Unified Knowledge; Figure 1) research project [1] funded by the European Union from 2006 to 2008 to the tune of several million euros . It was intended to provide everything necessary to promote and simplify the development of a semantic desktop.
RDF and Other Calamities
Nepomuk required the RDF (Resource Description Framework) to describe and store relations [2]. The implementation of Nepomuk in KDE collects information about stored data from all KDE applications, links and processes this information, and serves it up to the search function.
Since the introduction of Nepomuk in KDE 4, many users repeatedly complained about its poor performance and lack of stability. Application developers, in turn, found the API too complicated and called for extra features. Although the KDE developers made improvements over the years, many users continue to disable Nepomuk. Especially in interactions with Akonadi, the underpinnings of the PIM programs, Nepomuk generated excessive load.
Virtuoso, Vishuoso, and a Restart
The KDE developers identified the Virtuoso database, which ran in the background and which Nepomuk used to store its RDF data, as the major obstacle to better performance. Although Virtuoso itself worked pretty fast, it hogged extremely large amounts of memory. The KDE developers therefore started to implement their own RDF store named Vishuoso [3]. However, they continually stumbled over the requirements of the EU research project. They were partly vague, incomplete, and sometimes even redundant [4].
For these reasons, the KDE developers decided to take a radical step: They ditched RDF and the old Nepomuk and developed a successor named Baloo [5]. In contrast to Nepomuk, Baloo, which is named after a character in The Jungle Book (think "Bear Necessities"), is designed to be more frugal with resources, be more reliable, and deliver superior results faster. Internally, Baloo is modular, so in future, it will be easier to add new features and improve existing ones. Additionally, the design was overhauled with the intention of preventing failures.
The KDE developers have not completely rewritten Baloo; rather, they have recycled parts of the code. Thus, they refer to Baloo as the "next generation of Nepomuk search." The changes to the programming interface are thus manageable; application developers should find it relatively easy to adapt their software.
Relations and Stores
Baloo manages relations between two "uniquely identifiable identifiers." A file has the unique identifier file:<x>
, whereas Akonadi creates a unique identifier of the form akonadi:?item=<x>
for most PIM data.
Each relation ends up in a separate and appropriate data store. In the simplest case, this is a table with two columns in a SQLite database. Baloo deliberately does not require a specific storage format or data store API. The advantage of this setup is that the data store can be tailored to match the information to be stored, thus letting Baloo store the data in the best possible way.
The search itself is performed by the search stores. Each search store only takes care of certain data. For example, one exclusively searches for files (File Search), another searches in email (Email Search), and third searches in contacts (Contact Search). Each of these search stores provide a specific API through which the search can be triggered; however, they can also provide additional APIs.
Currently, Baloo only manages files and comes with a matching search store and data store. The collected data is stored in a SQLite database. The search is additionally supported by the Xapian software, which indexes the collected dataset. Akonadi already stores all its PIM data itself; the search in this dataset is handled by search stores for contacts and email [6]. Thanks to this new architecture with individual data and search stores, Baloo is unlikely to require too much memory and should deliver matching search results extremely quickly.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.