Converting HTML to PDF from the shell

step · Post by **step** » Mon Sep 02, 2024 7:11 pm

Using wkhtmltopdf

Wkhtmltopdf is no longer under active development but it remains good. It can convert HTML to PDF using the Webkit web engine, from local or online pages. I've only tested it briefly with a local HTML file on Fatdog 903.

Download the AMD64 static binary package for Debian bullseye from the downloads page of the project https://github.com/wkhtmltopdf/wkhtmlto ... wnloads.md. Package size: 34 MB.
Convert the downloaded .deb file to a Fatdog64 package and install the resulting package. You can do both things from the right-click menu. Installed size: 3 * 43 MB.
Install the openssl-compat 1.1.1u package from Gslapt.

You should be good to go. With test file /tmp/x.html:

Code: Select all

# wkhtmltopdf --version
wkhtmltopdf 0.12.6.1 (with patched qt)

# wkhtmltopdf /tmp/x.html /tmp/x.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

# rox /tmp/x.pdf

Read the help screen, wkhtmltopdf has many options to tweak the output PDF (or image, with the included wkhtmltoimage).

Note regarding package size: wkhtmltopdf is big because it includes a static patched version of Webkit. All other libraries are loaded dynamically.

Using LibreOffice

Code: Select all

# libreoffice --headless --convert-to pdf /tmp/x.html
convert /tmp/x.html -> /root/x.pdf using filter : writer_web_pdf_Export

The result is, I believe, the same one would get from importing HTML in the GUI program, then exporting it to PDF from the File menu. But it works without the GUI, so it's suitable for scripting.

Using htmldoc

This is another program you can install from Gslapt. I'm going straight to an example:

Convert online FAQ to PDF book

Code: Select all

# mkdir -p /tmp/test

# HTMLDOC_DEBUG=all htmldoc -t pdf13 -f "/tmp/test/fatdog.pdf" --webpage --no-title --linkstyle underline --size Universal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t. --header1 ... --footer h.i --nup 1 --tocheader .t. --tocfooter ..I --portrait --color --no-pscommands --no-xrxcomments --compression=1 --jpeg=0 --fontsize 11.0 --fontspacing 1.2 --headingfont Helvetica --bodyfont Sans --headfootsize 11.0 --headfootfont Helvetica --charset utf-8 --links --embedfonts --pagemode document --pagelayout single --firstpage p1 --pageeffect none --pageduration 10 --effectduration 1.0 --no-encryption --permissions all  --owner-password ""  --user-password "" --browserwidth 680 --path "/tmp/test" --no-strict --overflow http://distro.ibiblio.org/fatdog/web/index.html

No I didn't write that long command line myself! htmldoc has an optional GUI dialog, and it saves the resulting command as a text file (*.book). Mind that the PDF document doesn't retain the original styles. Instead it applies its own "clear reading" styles. I like the ability to create a binder, which is also possible using wkhtmltopdf on the other hand.

Using Chrome and Chromium-based browsers

Yet another options is to use the PDF printer built into the Chrome/Chromium browsers. Here's an example using Vivaldi but you should be able to replace vivaldi-stable with, say, google-chrome and get similar results.

Code: Select all

# run-as-spot vivaldi-stable --headless --disable-gpu --print-to-pdf=/tmp/x-vivaldi.pdf /tmp/x.html
[0902/215738.919591:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0902/215738.922587:WARNING:runtime_features.cc(629)] Topics cannot be enabled in this configuration. Use --enable-features=BrowsingTopics in addition.
[0902/215739.021238:WARNING:sandbox_linux.cc(430)] InitializeSandbox() called with multiple threads in process gpu-process.
[0902/215739.033372:WARNING:runtime_features.cc(629)] Topics cannot be enabled in this configuration. Use --enable-features=BrowsingTopics in addition.
63832 bytes written to file /tmp/x-vivaldi.pdf

This got me the most faithful rendition of the original x.html file of all the methods discussed to this point. A weak point is that there is nothing you can tweak in terms of page size, borders, etc.

Using Firefox

I'm told Firefox can do the same trick as Chrome. I didn't test it but the command should be something like (depending on your firefox version):

Code: Select all

firefox --headless --print-to-pdf "/tmp/x.html"