Document conversion from the command line
Command Line – Pandoc
Pandoc lets you convert files from one markup format to another at the command line.
A strength of free software is that applications usually have everything users need for a specific purpose, a tendency that is especially strong in apps for KDE and the command line. Pandoc [1], a universal document converter, exemplifies this strength.
First released in 2006 by John MacFarlane, a philosophy professor at the University of California, Berkeley, Pandoc is a Haskell library for converting between text formats, especially those using a markup format (Table 1). In effect, it is an all-in-one replacement for the dozens of scripts that exist in many distributions for the same purpose.
Pandoc is not equipped to precisely convert complicated layout, such as margins and tables, in formats like PDF or Open Document Form (ODF). However, templates can be created for different formats. Sometimes, though, converting content alone is far better than not at all. Moreover, in many cases, Pandoc is adequate for simple formats, like articles or essays, especially in a markup language. It also has advanced features for slide shows, citations, and bibliographies.
By default, Pandoc produces a document fragment as standard output (Figure 1). The general output type plus the specific format must be specified, as well as the input source:
pandoc -f markdown -t latex pandoc.txt
The result is a fragment for the extension specified that can pasted into another document. To save the output, you must specify a file using the --output
(-o
) option. If you want a complete file, rather than a fragment, add the --standalone
option. As with many command-line options, saving to a file produces no output unless something goes wrong.
If you do not specify the input and output, Pandoc will attempt to guess them. To ensure formatting, a template file can be specified (see the Templates section below). Use the -t
option to list the types of formats supported. If multiple input files are specified, they are concatenated into a single output file with a space between the contents of each input file.
Templates
Each supported format has a default template stored in /usr/share/pandoc/data/templates/
. Most follow the naming structure default.FORMAT
. Exceptions include ODT's template, which is named default.opendocument
, and PDF, which shares the default.latex
template. In addition, EPUB uses epub-page.html
, epub-coverimage.html
, and epub-titlepage.html
. You can view the default template using the command pandoc -D FORMAT
(Figure 2).
You can write or download custom templates [2] or modify copies of existing templates [3] if the default template does not meet your needs. Templates consist of fields with fixed values and may include variables that are replaced by elements of the source file, often automatically. For example, the variable <title>$title$</title>
is replaced automatically by the source file's title. More advanced users can include if/else or conditional statements. For a full description of custom templates, see Pandoc's man page and user guide [4].
In the end, if content is more important than structure, you can generally use the default templates without tweaking them.
Note that early releases of Pandoc required additional applications to convert to PDF. Several online sources like Wikipedia continue to list this requirement, but it is now obsolete.
Input/Output Options
Instead of templates, you can do some formatting using options. To eliminate any ambiguity in the command structure, you can specify the input format with --from FORMAT
(-f FORMAT
) or --read FORMAT
(-r FORMAT
), and the output with --to FORMAT
(-t FORMAT
) or --write FORMAT
(-w FORMAT
). Similarly, although the default directory for all output to a file is .pandoc
, you can specify another directory with --data-dir=DIRECTORY
.
Other options affect the internal formatting. For instance, while the default format is to replace tabs with spaces, --preserve-tab
(-pv
) will override the default. When setting up tabs, you may also use --tab-stop=NUMBER
to change the default four spaces used for tabs. You can also use --base-header-level=NUMBER
to set the first heading level to use and --smart
(-S
) to use typographic characters such as smart quotes and em dashes (instead of two hyphens).
Individual formats also have their own formatting options. For instance, in HTML5, --section-div
adds <div>
or <section>
tags, which can be formatted with CSS stye sheets created outside Pandoc. LaTeX, ConTeXt, and DocBook output can use --chapters
to convert the top-level headings into chapters, while --no-tex-ligatures
suppresses ligatures in LaTeX or ConTeXt output, which can be convenient with some recent OpenType features. More generally, several options are intended primarily for code, such as the self-explanatory --no-wrap
, --columns=NUMBER
, --no-highlight
, and --highlight=STYLE
(with options of pygments
, kate
, monochrome
, espresso
, zenburn
, haddock
, and tango
). Many of these options can reside in a single file that is specified with --defaults = FILE
, eliminating the need to continually structure a detailed command.
For many output formats, options provide most formats with the exception of spacing options. However, layout can be added via CSS style sheets and linked with --css=URL
. Some output formats have specific options for style sheets, such as --reference-odt=FILE
(ODT), --reference-docx=FILE
(DOCX), and --epub-stylesheet=FILE
(EPUB). If you regularly convert to such formats, developing a style sheet may be worth the effort. You may even find a style sheet online that you can use with little or no modification.
Special Uses
Besides routine format conversion, Pandoc has several special uses. For instance, Pandoc supports several slide show applications, including PowerPoint. However, to judge by the available options, its main emphasis is on Beamer, a LaTeX-based presentation application [5]. The markup for a Beamer slide is as simple as starting each one with ##
. To Beamer's own thorough array of features, Pandoc adds options of its own. While converting a file for use in Beamer, Pandoc can define a logo, title graphics, navigation symbols, Beamer theme, and the aspect ratio for slides. Common layouts include slide backgrounds, transitions, and lists in which items are displayed one at a time. There is even an option to add Beamer options to the converted presentation. In addition, Pandoc can convert a Beamer presentation to an article. Pandoc's emphasis on Markdown provides a professional slide show application regardless of the office suite used.
Pandoc also has extensive support for citations and bibliographies. Using the option --citedoc
, Pandoc can generate citations from a source file and a bibliographic database specified with one --bibliography=FILE
for each bibliography used. BibLaTeX (.bib
), BibTeX (.bibtex
), CSL JSON (.json
), and CSL YAML (.yaml
) are all supported formats. By default, Pandoc uses the Chicago Manual of Style citation style, although other citation formats can also be defined. There is even a --citation-abbreviations=FILE
option that can define abbreviations for often used titles. The citations and bibliography are kept separate from the Pandoc files, making it easy to update and then generate a new file.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Wine 10 Includes Plenty to Excite Users
With its latest release, Wine has the usual crop of bug fixes and improvements, along with some exciting new features.
-
Linux Kernel 6.13 Offers Improvements for AMD/Apple Users
The latest Linux kernel is now available, and it includes plenty of improvements, especially for those who use AMD or Apple-based systems.
-
Gnome 48 Debuts New Audio Player
To date, the audio player found within the Gnome desktop has been meh at best, but with the upcoming release that all changes.
-
Plasma 6.3 Ready for Public Beta Testing
Plasma 6.3 will ship with KDE Gear 24.12.1 and KDE Frameworks 6.10, along with some new and exciting features.
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.
-
HashiCorp Cofounder Unveils Ghostty, a Linux Terminal App
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.