Document management for the small office
Defying Chaos
Even in a small office, countless letters, email messages, and PDFs arrive daily. Document management systems help you avoid drowning in the flood of documents.
It's been more than a decade since the proclamation of the paperless office, with special document management systems (DMSs) proposed as the tool to manage arbitrary documents without miles of shelving. DMSs typically operate as client-server applications that users can access by means of a database back end.
Most of these DMS applications are at home in medium to large enterprises and are hopelessly oversized for use in small home offices. Successfully using a DMS becomes even more difficult when the requirements include Linux support. Nevertheless, I searched for DMSs for Linux workstations that relieve the strain on small offices without time-consuming training and permanent maintenance. In my search, I've taken a look at Krystal DMS, LogicalDOC, Paperwork, and Referencer (see also the "Not Tested" box).
Not Tested
OpenKM [1] was intended to be the fifth candidate in this test. Although it has a Linux version – including a community release and commercial and cloud packages – in our lab, the software proved to be extremely recalcitrant, with no usable installation routines for small offices or for less savvy admins, as well as no current documentation. Instead, you are expected to install the required packages manually, individually, and separately (including a Tomcat application server, a MySQL database, and applications such as ImageMagick and Ghostscript), followed by editing of complex configuration files – again by hand.
Although the manufacturer provides help documents, they are hopelessly out of date and caused attempted installations on current Linux distributions to fail. Some recent Linux versions also no longer offer the required packages. For Fedora and Red Hat Linux, the documentation refers to OpenOffice Suite 3.1.1, which was released August 31, 2009, and has seen countless new releases in the meantime.
The Debian and Ubuntu documentation also is out of date: It describes the configuration for the long-since-replaced SysVinit system but does not tell you how to handle the service units of the current systemd session manager. The Apache web server configuration no longer works as described, either. For all of these reasons, I did not test OpenKM for this article.
Requirements
Ideally, the DMS should reproduce the workflow of a document starting with its creation, through its entire lifecycle, to final deletion. The DMS should handle not only printed documents, but also files that exist electronically in various formats (e.g., email).
The DMS does not just act as an archiving system for quick access to archived documents using keywords, date stamps, or other attributes. It also needs to optimize the flow of information in organizations by introducing distribution mechanisms for eligible recipients, document linking, and access monitoring.
A modular design should also ensure trouble-free processing of documents in third-party applications, including popular office suites or Enterprise Content Management (ECM) systems.
Multiplatform capability to allow the use of the client on mobile devices like tablets is also becoming increasingly important. Today, this also includes cloud connections for access to documents in the DMS independently of stationary IT. Last, but not least, regulatory requirements for archiving also need to be met wherever you are in the world.
In the Small Office
Small offices do not typically require large DMSs that are usually difficult to install and configure and require regular maintenance on top. However, alternatives for small offices also need to handle input sources, such as printed documents, files of different formats, and stored email. Ideally, they should also include a scan engine that enables reading and text recognition of printed originals. Keywording and other storage functions are in the DMS's domain, as well as interfaces for the major office suites (see Table 1).
Table 1
Overview DMS Functions
| Krystal DMS | LogicalDOC | Paperwork | Referencer |
---|---|---|---|---|
Modular design |
Yes |
Yes |
Yes |
No |
Localization |
Yes* |
Yes |
Yes |
Yes |
Client-server architecture |
Yes |
Yes |
No |
No |
Web-based interface |
Yes |
Yes |
No |
No |
Scanning module |
Yes* |
Yes* |
Yes |
No |
Multiple sheet scanning |
Yes* |
Yes* |
Yes |
No |
OCR module |
Yes (external) |
Yes (external) |
Yes |
No |
Import function |
Yes |
Yes |
Yes |
Yes |
Export function |
Yes* |
Yes |
Yes |
Yes (external) |
Viewer |
Yes |
Yes |
Yes |
No |
Indexing and searching |
Yes |
Yes |
Yes |
Yes |
Version history |
Yes |
Yes |
No |
No |
Comments |
No |
Yes |
No |
Yes |
Cloud connection |
No |
Yes |
No |
Yes |
Mobile apps |
Yes |
Yes |
No |
No |
Link to CMS systems |
No |
Yes |
No |
No |
*Available only in the commercial versions. |
Less relevant in small DMS solutions, however, is sophisticated mechanisms for granting rights and modules for interacting with major league ERP and ECM solutions. Also, the ability to use an app to access the DMS software from a mobile device, such as a tablet or smartphone, is less important in this working environment. What proves to be as important in the service portfolio solution for small offices and individual workstations, however, is easy installation and configuration of the software.
The Trouble with OCR
Reliable detection of scanned originals remains problematic on Linux. If the DMS applications do not have their own OCR modules, users are forced in many cases to rely on third-party solutions. In a Linux Magazine lab, we tested an OCR team consisting of Tesseract and gImageReader. The solution turned out to be technologically mature and therefore usable (see the "Tesseract and gImageReader" box).
Tesseract and gImageReader
Hewlett-Packard (HP) worked on the Tesseract [2] text recognition engine between 1985 and 1995. For 10 years, development lay dormant because HP had abandoned this market segment. In 2005, Google acquired the software and, after revising the code, released it to the developer community as free software under the Apache license. Subsequently Tesseract spread throughout the Linux universe. Thanks to the modular design, Tesseract is also multilingual, and even German blackletter types are now detected if you have the matching modules in place. Not even foreign languages with many nonstandard characters can pose unsolvable problems for the software.
Because OCR engines are typically command-line-only applications, third parties have developed various graphical interfaces over the years to make the programs easier to use. The GUI environments often cover one or several special engines.
gImageReader [3] has established itself as a relatively unknown front end for Tesseract OCR. In addition to ease of use, it promises a particularly lean design and therefore comes without unnecessary bells and whistles. Both software packages are available in software repositories of the popular Linux distributions. You can thus install at the push of a button on your flavor of Linux, then simply call the graphical front end, which automatically launches the OCR engine in the background, so you can scan originals and launch the recognition process (Figure 1).
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Fedora Asahi 40 Remix Available for Macs with Apple Silicon
If you've been anticipating KDE's Plasma 6 for your Apple Silicon-powered Mac, then you're in luck.
-
Red Hat Adds New Deployment Option for Enterprise Linux Platforms
Red Hat has re-imagined enterprise Linux for an AI future with Image Mode.
-
OSJH and LPI Release 2024 Open Source Pros Job Survey Results
See what open source professionals look for in a new role.
-
Proton 9.0-1 Released to Improve Gaming with Steam
The latest release of Proton 9 adds several improvements and fixes an issue that has been problematic for Linux users.
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.