Ask HN: What is nowadays (opensource) way of converting HTML to PDF?

47 points by hhthrowaway1230 3 days ago


I'm using wkhtmltopdf but it is painful to work with? what are other people using nowadays? i.e canva or other tools?

pabs3 - 2 days ago

Just print to PDF in a browser, or automate that using a browser automation tool. For a non-browser-based open source solution, WeasyPrint.

https://weasyprint.org/

For a proprietary solution, try Prince XML:

https://www.princexml.com/

kappadi3 - 3 days ago

Puppeteer and Playwright are the main open-source options nowadays, both solid for HTML → PDF once your print CSS is sorted. Don’t forget proper page breaks (break-before/after/inside) — e.g. break-after: page works in Chromium, while always doesn’t. For trickier pagination you can look at Paged.js, and I’d test layouts in Chrome/Edge before automating.

Shameless plug: I run yakpdf.com, a hosted Puppeteer-based service if you want to avoid self-hosting. https://rapidapi.com/yakpdf-yakpdf/api/yakpdf

Aachen - 2 hours ago

Please don't turn nice formats into a format that's similar to screenshots of text. Pandoc has an option to pack all images and styles needed to render the page into one html file:

    pandoc --self-contained input.html -o output.html
delduca - 4 minutes ago

https://gotenberg.dev

lizimo - 9 minutes ago

If generating PDF dynamically is what you really care about, consider Typst. https://typst.app/ We use it in production to generate reports, and it is amazing.

Snawoot - 3 hours ago

chrome --headless --disable-gpu --print-to-pdf https://example.com

RiverCrochet - 2 hours ago

If you don't really need the PDF but just want to archive pages, SingleFile is better. It'll capture the entire page to a single HTML file and I find this is better than the PDF if I don't want to print it. It's a browser extension, but there's also a command line version (https://github.com/gildas-lormeau/single-file-cli) that uses Chrome or Chromium's headless mode.

juice_bus - an hour ago

I have Chromium shoved into an AWS Lambda Layer, when we need HTML to PDF conversion we shove it off onto that. It loads the HTML into Chromium then "prints" it to PDF.

freedomben - 32 minutes ago

I'd love to go the other way: convert a PDF into a self contained HTML page that renders properly in a browser. It's been way harder than I thought it would. Any advice?

thangalin - 2 hours ago

Is this an xy problem? If you have the original document (in Markdown), one possibility would be to use my software, KeenWrite[1], to convert Markdown to XHTML then typeset XHTML to PDF via ConTeXt. See the user manual[2] for an example of a Markdown document typeset in this fashion, along with usage instructions.

If you only have HTML to work with, you can also use Flying Saucer[3], which is what KeenWrite uses to preview Markdown documents when rendered as HTML. Flying Saucer uses an open-source version of iText[4] to produce PDF documents (from HTML source docs).

Another possibility is to use pandoc and LaTeX.

[1]: https://keenwrite.com/

[2]: https://keenwrite.com/docs/user-manual.pdf

[3]: https://github.com/flyingsaucerproject/flyingsaucer

[4]: https://itextpdf.com/

etyhhgfff - 2 hours ago

What exactly is so painful about it? It is just one command, can be isolated in a container and runs on every Linux machine.

docker run alpine-wkhtmltopdf google.com - > test.pdf

Source: https://github.com/madnight/docker-alpine-wkhtmltopdf

bob1029 - an hour ago

If your HTML is simply an intermediary to get you to a PDF, you could consider just skipping straight to building the PDF directly:

https://pdfbox.apache.org

This would be far more efficient than spinning up an entire browser and printing PDFs to disk.

haft - 2 hours ago

A reverse of this question; what is the best way to convert pdf to html? We are required by accessibility law to make our PDFs WCAG compliant however it would be easier to convert these to HTML.

ratStallion - an hour ago

My website's content is xml, and I use Apache Fop to turn it into a PDF with page numbers and other nice things. It works nicely, but takes some setup.

- 2 hours ago
[deleted]
nicoburns - 2 hours ago

https://github.com/plutoprint/plutobook was a recent Show HN and looks excellent

mightjustwork - 3 days ago

https://gotenberg.dev/ ...has been working well for me for the last few years. It's a headless instance of Google Chrome with a golang wrapper. Runs well in Docker or a cloud instance.

haft - 2 hours ago

A revers of this question; what is the best way to convert pdf to html? We are required by accessibility law to make our PDFs WCAG compliant however it would be easier to convert these to HTML.

hhthrowaway1230 - an hour ago

5k pdfs a month for archival purposes, must be pdf, customers demand this

throw03172019 - 3 days ago

I run chromium on my server and render the PDF from there using puppeteer.

zja - 3 days ago

pandoc

ftchd - an hour ago

the only thing I found to work reliably well is simply Chromium's print feature

exabrial - 3 hours ago

openhtmltopdf is what we're using. Some outdated versions.

fogzen - 15 hours ago

Don’t. Show a web page and open the print dialog, and tell people to save as PDF. All major browsers support this, and the browser HTML to PDF code is the most robust and accurate.

lovelydata - 7 minutes ago

[dead]

journal - 15 hours ago

if you are doing html to pdf, you might also need the ability to merge. a few more features and you're better of with a commercial solution.