DocFetcher
Bloodhound
DocFetcher is a practical local search tool that is easy to configure and use – even for large data collections.
Modern operating systems take up several Gigabytes of space just for the many application programs, and they sometimes contain up to several hundred thousand individual files. If you add your extensive music or photo collection, you can quickly lose track.
Modern desktop environments offer indexing and search applications for existing data, and the Linux environment includes several special search programs. However, many of these programs are not very intuitive, and some even expect you to install a database as a backend. In addition, many of the tools often do not support full-text searches. If you are looking for a lean, practical, and powerful search tool for your workstation, DocFetcher is a very interesting alternative.
You can download the Java application from the project page, where you will also find installation instructions [1]. As a prerequisite, you need a reasonably up-to-date Java runtime environment; DocFetcher harmonizes perfectly with the current OpenJDK environments, which you can usually install directly from your distribution's software repositories.
Unpack the downloaded ZIP archive with the DocFetcher files using a tool like Ark, File Roller, or Xarchiver. You can then move the subdirectory you created to a directory of your choice. To start the program from a desktop menu, however, you need to manually create a menu entry (see the box entitled "Installation").
Installation
Many Linux distributions do not include DocFetcher in their package sources. Ubuntu, for example, does not yet include a package for DocFetcher. It is thus often necessary to install DocFetcher manually.
Listing 1 shows how to unpack the ZIP archive downloaded from the project page into the /usr/local/bin/
directory. In Listing 2, you will find the content for /usr/share/applications/docfetcher.desktop
to help you create a matching entry in the Start menu of the desktop environment.
Adjust the version number in the commands if necessary. If you prefer a location other than /usr/local/bin/docfetcher/
, remember to change the paths appropriately. If you are still using a system without GTK3 libraries, you also need to swap DocFetcher-GTK3.sh
for DocFetcher-GTK2
.sh in the Exec
line.
Listing 1
Unzipping DocFetcher
$ unzip docfetcher-1.1.19-portable.zip $ sudo mv DocFetcher-1.1.19/ /usr/local/bin/docfetcher
Listing 2
Creating a Menu Entry
Version=1.0 Name=DocFetcher GenericName=Document Index and Search X-GNOME-FullName=DocFetcher Document Index and Search Comment=Index and Search your computer Type=Application Categories=System;Utility;FileTools;Java; Exec=/usr/local/bin/docfetcher/DocFetcher-GTK3.sh Terminal=false StartupNotify=true Icon=/usr/local/bin/docfetcher/img/docfetcher128.png
Start Your Engines
When you first launch DocFetcher, some systems start with a dialog where you can change the keyboard shortcut from the default ([Ctrl]+[F8]). If the shortcut is already mapped, a message asks you to confirm by pressing OK. The program window, which is divided into five panes, then appears. In the top-left corner, you will find an input field for the minimum and maximum file size that DocFetcher should consider for the search.
Select the file types you want DocFetcher to find from a dropdown list; the program enables all supported formats by default. Below is the search area, and top-right is an input line for the search terms. Below this area, the software lists the results with information on match relevance and file size; an area in the bottom right displays the contents of the selected file (Figure 1).
DocFetcher needs to index the contents of the mounted storage media in order to search reliably and quickly even in large data sets. You can trigger this indexing from the Create Index From dialog, which you can access by right-clicking in the search area in the bottom-left of the main window. Then select either a folder or an archive file. In Microsoft environments, DocFetcher supports indexing of PST files containing messages, contacts, tasks, or appointments.
To limit the size of files that the program should consider, enter the minimum and maximum values in the boxes in the upper-left corner. The process of indexing the data collection, which relies on Apache Lucene [2], takes some time during the first run, but this step will significantly speed up searching in these folders (Figure 2).
After indexing is complete, you will find the indexed directories and archives in the Search Scope pane. Enter the desired search terms in the search box. After you press the Search button, DocFetcher searches through the indexed data and lists the locations. Files containing the search term appear together with information such as the file size. Below you will find the text passages where the search term appears. DocFetcher highlights the term in yellow (refer to Figure 1).
Multiple Terms
In addition to the simple keyword search, DocFetcher also offers simultaneous searching for several keywords. You can also search for word sequences or specify terms to exclude from the search. If you want to search for two terms, enter the two terms with the AND
operand. DocFetcher searches for files in which both terms occur together, although they can occur at any location in the text. If you want the application to find an exact word order, you need to put the words in quotes.
You can exclude a term from the search by prefixing it with a minus sign. For a wildcard search, use a question mark or asterisk. The question mark replaces exactly one character in a search term; the asterisk replaces several characters. Especially when searching for compound nouns and technical terms, the asterisk is most helpful.
The search sometimes reveals results that are not needed at all. With the option to exclude unneeded formats, you can quickly thin out the list of hits. Uncheck the boxes to the left of the individual file formats in the Document Types window segment. Alternatively, use the Search Scope pane to limit the search to the relevant directory trees.
In the results display, you can scroll through the terms found page by page by clicking on the arrows to the left or right above the search display. The matches are shown with a yellow background. The up/down arrow buttons are used to navigate from match to match; DocFetcher highlights the search key in green.
Updates
As soon as you store new data in the directory hierarchies integrated by DocFetcher, you have to update the index to include all files in later searches. To update the index, right-click on the index in the search area and select the Update… option from the context menu. DocFetcher now integrates the new files and directories into the index in a process that is far faster than the initial indexing.
You can use the same context menu to list the documents in a folder without searching through them. Select the List Documents option. The software then displays the individual files in the results display top right in the program window. You can only apply this function to a single directory, not to higher-level directories that only contain subdirectories themselves.
To remove individual files from the folder, right-click the file and select Open Parent Folder from the context menu. The file manager opens, listing the files in the parent folder. Alternatively, you can display the folder contents by right-clicking on the directory in the lower-left corner of the search area and selecting Open Folder from the context menu.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.