Carving tools help you recover deleted files
Undeleted
Modern filesystems make forensic file recovery much more difficult. Tools like Foremost and Scalpel identify data structures and carve files from a hard disk image.
IT experts and investigators have many reasons for reconstructing deleted files. Whether an intruder has deleted a log to conceal an attack or a user has destroyed a digital photo collection with an accidental rm -rf, you might someday face the need to recover deleted data. In the past, recovery experts could easily retrieve a lost file because an earlier generation of filesystems simply deleted the directory entry. The meta information that described the physical location of the data on the disk was preserved, and tools like The Coroner's Toolkit (TCT [1]) and The Sleuth Kit (TSK [2]) could uncover the information necessary for restoring the file.
Today, many filesystems delete the full set of meta information, leaving the data blocks. Putting these pieces together correctly is called file carving – forensic experts carve the raw data off the disk and reconstruct the files from it. The more fragmented the filesystem, the harder this task become.
Many open source tools automate the carving process: The list is headed by Foremost [3] and its derivative Scalpel [4], but other tools include PhotoRec [5] and FTimes [6]. PhotoRec does not support generic carving for any file type, and FTimes is so hard to use it is not worthwhile for most users.
Foremost and Scalpel are not interested in the underlying filesystem. They simply expect the data blocks of the files to reside sequentially in the image under investigation. The tools will find images in dd dumps, RAM dumps, or swap files. Carving will help to identify and reconstruct files on corrupt filesystems, in slack space, or even after installation of a new operating system, as long as the required data blocks still exist.
Of course, none of these tools can perform miracles, and they are not designed to retrieve data from physically damaged hard disks. Also, the carving process cannot access data blocks that have been overwritten.
Because carving tools do not rely on the filesystem, they need other sources of information to discover where a file starts and ends. Fortunately, many file types have known structures. The header and footer are often all that is needed to identify the file type and location. The Linux file command also uses header and footer information to identify file types.
File carvers investigate the whole hard disk, or disk image, to locate known headers and footers. They then carve out the blocks between the header and footer and store the data as a new file. Some file types do not possess unique footers. Carvers will at least guess where the file ends on the knowledge of where the next header starts. Of course, any amount of unidentified data could reside between the end of the file and the next header.
To avoid collecting unnecessary junk data, carving programs allow users to set maximum file sizes. Unfortunately, headers and footers are often short, which leads to numerous false positives.
Image formats are an exception. For example, each JPEG file starts with a byte sequence of 0xFFD8, typically followed by 0xFFE00010. File carvers are thus very good at identifying JPEG images. However, if some blocks have been overwritten, or if the file is fragmented, the tools will restore only a part of the file at best (Figure 1).
Foremost and Scalpel
Jesse Kornblum and Kris Kendall from the United States Air Force Office of Special Investigations developed Foremost in March 2001 as a tool for analyzing and recovering deleted files. The Foremost carving tool is inspired by an earlier program called CarvThis, which was created back in 1999 by Defense Computer Forensic Lab but never released to the general public. Foremost is now open source, and Nick Mikus maintains the source code after giving the program a major boost in the scope of his Master's degree.
Golden G. Richard III developed a separate program dubbed Scalpel based on Foremost 0.69. For a long time, Scalpel was regarded as an advanced tool. Some sources even claim that the Foremost developers recommend Scalpel themselves [7]. To be more accurate, both projects are under active development. Although Scalpel was far superior to its predecessor in 2005 – with the ability to analyze images around 10 times faster – Foremost has caught up recently thanks to Nick Mikus, and it is actually superior to its derivative for some tasks.
Both Foremost and Scalpel use configuration files to specify which files to search for (Listing 1). The first column designates the file type and also specifies the file extension to add to any files the program finds. Files for which the case is relevant in the header and footer have a y in column two; this is n for all others. The next column defines the maximum file size, followed by the header byte sequence, and the footer byte sequence if it exists. The \x string introduces a byte in hexadecimal notation; the other possibilities are \s for a space and ? as a wildcard for any character. Other options can follow at the end.
Listing 1
Configuration
Fast Finder
Because of its origins, Scalpel uses the same configuration file as Foremost, although the two tools work differently internally. Both tools find more or less the same files, but there are some discrepancies in file identification. Forensic experts are thus well advised to use both programs.
Versions 0.9.1 and later of Foremost use a new approach to identifying ZIP, JPEG, Office, and other formats. The formats are implemented directly in Foremost, meaning that the program does not need header and footer information in the configuration file for the identification process. Foremost enables this new detection function if you set the -t flag at the command line followed by the required file types:
foremost -T -t jpg,gif,pdf -i imagefile
Supported formats are listed in Table 1. To enable all of these built-ins, just set the -t all option. The previous command line also sets the -T option to tell Foremost to write any files it finds to a directory that uses a name with a timestamp. This makes it easier to organize the forensic investigation, in that each new run writes its results to a new directory.
Space Requirements
The possibility of false positives means that the carver identifies a huge amount of data, so make sure you have enough free space on the target filesystem. The carving process doesn't necessarily require large amounts of copying. Virtual filesystems, such as CarvFS [8], are designed to access the data directly from the original image. CarvFS, which is based on FUSE (Filesystem in Userspace), only expects the carving tool to provide a table that describes which files are available at which physical locations. The CarvFS filesystem originated with the Dutch police's Open Computer Forensics Architecture (OCFA) project (see the article on OCFA in this issue), and it is intended for situations in which copying all the files to a separate location would result in huge volumes of data. In other cases, however, copying the data is more efficient than accessing it from the original image.
A typical Foremost run without built-ins is shown in Listing 2. The image for this example comes courtesy of the Digital Forensic Research Workshop (DFRWS [9]) challenge. DFRWS ran this competition in 2006 to test file carvers and promote their development. At the end of the competition, the organizers published a list of the files in the image.
Listing 2
Foremost Run
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Gnome 48 Debuts New Audio Player
To date, the audio player found within the Gnome desktop has been meh at best, but with the upcoming release that all changes.
-
Plasma 6.3 Ready for Public Beta Testing
Plasma 6.3 will ship with KDE Gear 24.12.1 and KDE Frameworks 6.10, along with some new and exciting features.
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.
-
HashiCorp Cofounder Unveils Ghostty, a Linux Terminal App
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.