A Deep Dive into the ELF File Format
Elf Quest
Linux and other Unix-based systems use the ELF file format for executables, object code, and shared libraries. Take a peek inside to learn how an ELF file is organized.
An object file is a black box that you can never really see inside. You can't just open it and read it. We all have some understanding that it contains instructions for the computer, coded in a format that only a computer can understand, but we're used to thinking of compiled program code as something of a mystery. If you take a closer look, however, the structure is not as opaque as you might imagine.
Linux, like most other Unix-like operating systems, stores program code in the Executable and Linkable Format (ELF). ELF superseded the less-flexible a.out
format and has been the standard in Linux since 1995 and in FreeBSD [1] since 1998. An ELF object file contains the processor-specific instructions that the CPU actually executes. It also contains other information relating to
- how and where to load the object into memory
- dynamic linking
- any public functions exported (or imported) by the object
- debugging
Much of this functionality is only documented in the source code for kernels and toolchains themselves or in various obscure parts of the web. Understanding the ELF format is essential for anyone writing a compiler, assembler, or linker for the Linux platform.
The best way to study the structure of an ELF file is to watch one come together. In this article, I'll show how to construct an extensible ELF file from scratch using flat assembler (fasm) [2], an x86 macro-assembler. Fasm supports the creation of "flat" binary files, meaning that the programmer determines (broadly speaking) every byte that appears in the output file. The ELF file will run on both Linux and FreeBSD; the full code is about 500 lines, and you can download it from the associated Git repository [3]. You'll need to install fasm from your operating system's package repository.
In your daily life, it is rarely necessary to build an ELF file manually. This exercise is intended for purposes of illustration. In the real world, a developer tool such as a compiler generates the ELF file based on the contents of source code and information provided by the developer.
The Layout of an ELF File
An ELF file usually has four main parts:
- the ELF header
- one or more program headers
- one or more section headers
- one or more sections
Assembly languages operate at a lower level than a language such as C, so assembly language corresponds more closely with the commands the computer actually executes. I'll therefore use assembly language as a means for describing the ELF format. Figure 1 provides a graphical representation of an ELF file and shows how this file relates to the ELF image when loaded into memory.
The ELF Header
In Listing 1, I declare the ELF header (for those new to assembly language, db
means "declare byte," dw
means "declare word" (two bytes), and so on; rept n {}
repeats the enclosed code n
times).
Listing 1
The ELF Header
01 ; Magic. 02 db 0x7f,"ELF" 03 ; Class (32- or 64-bit). 04 db ELFCLASS64 05 ; Endian-ness (least significant bytes 06 ; first). 07 db ELFDATA2LSB 08 ; Version of the ELF spec. 09 db EV_CURRENT 10 ; ABI (Application Binary Interface) - 11 ; we use ELFOSABI_LINUX = 3 or 12 ; ELFOSABI_FREEBSD = 9 13 if OS eq "Linux" 14 db ELFOSABI_LINUX 15 else if OS eq "FreeBSD" 16 db ELFOSABI_FREEBSD 17 end if 18 ; ABI version (always 0). 19 db 0 20 rept 7 {db 0} ; Padding. 21 ; Executable type (2) could also be 22 ; ET_REL = 1 for a relocatable object or 23 ; ET_DYN = 3 for a shared library. 24 dw ET_EXEC 25 ; Machine is x86-64. 26 dw EM_AMD64 27 ; File version (always set to EV_CURRENT 28 ; = 1). 29 dd EV_CURRENT 30 ; Entry point. 31 dq LOAD_BASE + PLANE1 + main 32 ; Program headers offset. 33 dq PROGRAM_HEADERS 34 ; Section headers offset. 35 dq SECTION_HEADERS 36 ; Flags (always 0). 37 dd 0 38 ; Size of this ELF header. 39 dw ELF_HEADER_SIZE 40 ; Size of one program header (56 bytes). 41 dw SIZEOF_PROGRAM_HEADER 42 dw NUM_PROGRAM_HEADERS 43 ; Size of one section header (64 bytes). 44 dw SIZEOF_SECTION_HEADER 45 dw NUM_SECTION_HEADERS 46 ; This is the section header table index 47 ; of the entry associated with the 48 ; section name string table. 49 dw SHSTRTAB_INDEX 50 ELF_HEADER_SIZE = $
As you can see in Listing 1, the ELF header contains general information describing the program's context. For instance, the header contains settings for the class (32-bit or 64-bit), the Endianness, the operating system, the file type (executable, relocatable object, or shared library), and the processor. The ELF header also marks out the space for the program and section headers, which you'll learn about later in this article.
Program Headers
Each program header describes a memory segment in the final loaded image. Program headers don't have any raw data associated with them; they just tell the loader to allocate memory that will be populated with data from one or more sections. Segments can overlap; you can use the overlap to mark the .dynamic
section as both PT_DYNAMIC
and PT_LOAD
. Because I'm declaring multiple segments, I wrap the declarations in a macro (Listing 2).
Listing 2
The Program Header Macro
01 macro PROGRAM_HEADER type,permissions, offset,virtual_address,physical_address, disk_size,mem_size,alignment 02 { 03 ; Segment type - common types are 04 ; PT_NULL = 0; PT_LOAD = 1; 05 ; PT_DYNAMIC = 2; PT_INTERP = 3; 06 ; PT_PHDR = 6. 07 dd type 08 ; Permissions - readable/writable/ 09 ; executable: an ORed combination 10 ; of values from PF_R = 0x4; PF_W = 11 ; 0x2; PF_X = 0x1. 12 dd permissions 13 ; Offset in file. 14 dq offset 15 ; Offset at runtime. 16 dq virtual_address 17 ; Unused; set to 0. 18 dq physical_address 19 dq disk_size 20 dq mem_size 21 dq alignment 22 ; The following variable lets us count ; program headers as we declare them. 23 CPROGRAM_HEADER = CPROGRAM_HEADER + 1 24 }
I can then declare a program header with a single macro invocation:
PROGRAM_HEADER PT_LOAD,PF_R or PF_X,SECTION_TEXT,LOAD_BASE + PLANE1 +SECTION_TEXT,0,TEXT_PLUS_PLT_SIZE,TEXT_PLUS_PLT_SIZE,0x1000
A segment's offset and alignment must obey the following constraint:
offset mod alignment == virtual_address mod alignment && offset mod page_size == virtual_address mod page_size
(where mod
is the integer modulo operator and page_size
is usually 4096 for Linux and FreeBSD). The PLANE*
constants in the source are there to maintain an appropriate minimum distance of page_size
between segments.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.