Detect duplicates with fdupes

Double Trouble

Article from Issue 273/2023
Author(s):

The command-line fdupes tool helps you find duplicate folders and directories.

Hard disks have the unpleasant tendency of filling up faster than expected. It is not always immediately obvious why. Keeping things tidy should not be underestimated in this context. Untidy, poorly organized hard disks tend to fill up faster than well-organized ones. Because life is a mixture of order and chaos, most users probably face this problem.

The unexpectedly high utilization level of hard disks is often caused by duplicate files. The typical candidates are photos, music, or videos, which can quickly occupy several gigabytes of space and are often difficult to find. There are several graphical applications on Linux to help you detect and remove duplicates like this, and there are several more for the command line.

GUI or CLI?

Well-known tools with a graphical interface for a cleanup include FSlint and dupeGuru. In this article, I will look at fdupes for the command line [1], first released in 2000. Most distributions include the tool, which weighs in at just over 100KB, in the archives; you can install using your distribution's choice of package manager. Listing 1 shows a guide for Debian, Fedora, and Arch Linux.

Listing 1

Installing fdupes

 

The current 2.2.1 version from September 2022 has not made its way into all repositories [2]. If you want to compile fdupes from the source code, you can use the tarball from GitHub. After unpacking, just follow the familiar three-step process of ./configure, make, and make install. As of fdupes 2.0, there are two dependencies that you may also need to resolve yourself, depending on the distribution. To do this, follow the instructions in the INSTALL file from the unpacked archive.

After the install, you can use the tool immediately without any configuration. It identifies duplicate files in the specified directories in several steps. The file name is not important for detection as a duplicate. Instead, two files must first be the same size; given this, fdupes compares their MD5 checksums. Finally, the software performs a byte-by-byte comparison, to make sure that it is definitely the same file.

Fdupes has numerous options that let you control the search and the subsequent deduplication. Initially, you will want to familiarize yourself with the tool by running the fdupes --help command. This will help you identify the options that suit your use case.

Test Run

For the test, I created an fdupes directory in the Documents directory and then created 10 text files whose content read fdupes finds and removes duplicates. Listing 2 shows you how to do this quickly.

Listing 2

Create Multiple Text Files at the Same Time

 

A following ls -l confirms that the files were created. The easiest way to search for duplicates in the new directory is to use the fdupes ~/Documents/fdupes command (Figure 1). By separating the paths with spaces, you can specify multiple directories at the same time. To search recursively in directories, you need to use the -r option, as in fdupes -r ~/documents (Figure 2). In this case, the tool finds my 10 text files along with some other duplicates. Use the -r option to specify the path of subdirectories you want to include.

Figure 1: The simplest method of finding duplicates does not need any call parameters to specify the directory.
Figure 2: Use the -r parameter to dig down further in a directory tree search.

The -S (--size) options shows you the size of the hits. You can use -t or --time to find out when a file was last modified. -G or --minsize=SIZE and -L or --maxsize=SIZE lets you further narrow down the selection.

Be Careful When Removing

But finding is only the first part of the task; after all, we want to delete duplicates to clean up the hard disk. This is where the (--delete) option comes in. When using -d, always make sure that your path specification is correct – files deleted with fdupes cannot be recovered. The command

fdupes -d ~/documents/fdupes

first lists the files in a numbered list (Figure 3). Note that the number at the beginning of the line will not necessarily match the number in the file name. If you now enter numbers separated by commas, they are tagged with a plus sign and remain intact, while the software removes all of the duplicates with a minus sign.

Figure 3: The -d parameter is used to delete any duplicates found and lists them in a numbered list.

If you make a mistake, the rg command cancels your previous entries. Pressing Delete applies your entries. If you want to remove all duplicates except the first one displayed, use the command

fdupes -r -d -N /path

You do not need to press Delete here – the -N (noprompt) option works without any confirmation.

Another selection option after calling fdupes with the -d option relies on the sel parameter. You can select all files with a specific term in the path by typing sel <term>. To select all files whose path starts with the term, use selb <term>. Use sele <term> to select files whose path ends with the term. To select all files whose path corresponds exactly to the term, use the selm <term> command. After that, you can decide which of the candidates you want to keep. Further options are described by the help command, which displays the matching fdupes man page sections.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News