HTML, EPUB, PDF and automated publishing.

Message

52midnight · Post by **52midnight** » Tue Jul 07, 2015 11:35 pm

I'm currently preparing a large, multi-file document for publication, and intend making it available in three formats:

- HTML for reading on the Net.
- EPUB for the modern eReaders.
- PDF for printed versions.

Given that I expect several revised versions in coming months, I want to automate the conversion process as much as possible. I've selected 'pandoc' for investigation but haven't used it. Being a commandline utility has some advantages, although if it succeeds it'll probably end up as the back-end for one or more GUI front-ends.

'Sigil' does a nice job of converting an HTML suite into an EPUB, and I might stick with it.

The problem is PDF. It began life as Adobe's property but was subsequently open-sourced. Like all such things it has never achieved the 'open source feel' - Microsoft's .mht is another example. The biggest problem is editing PDF's. The gold standard is supposed to be Adobe's Acrobat. It's not only a monster as regards size, but it tries to take over your whole system; I've long avoided it.

'qpdfview' does everything I want from a reader, but none of the others I've tried for editing seems much good. The solution I'm looking at is to use pandoc to convert HTML into ODT, do the editing in OpenOffice, and export into PDF.

Does anyone have experience or suggestions in this area?

ct85711 · Post by **ct85711** » Wed Jul 08, 2015 12:27 am

You may want to take a look at calibre, it is able to covert to all of those formats, and supports most if not all the various ebook formats. It does have the ability to do mass conversions (more of queues everything and does a couple conversions at the same time, I think like 4-5 processes)

52midnight · Post by **52midnight** » Wed Jul 08, 2015 12:49 am

Been a while since I looked at Calibre. If is does PDFs then I'll definitely take a look. Thanks

miket · Post by **miket** » Wed Jul 08, 2015 3:36 am

Good for you for thinking about the need to generate multiple output formats. I wish that the people who decided to play with the Gentoo home page would have that kind of sense.

A GUI tool is nice for checking things and maybe for some authoring tasks, but for that large-scale publishing you have in mind, nothing beats tools that can be controlled by scripts.

I tried pandoc for one project I had, but it didn't work out for that application. It would likely be better suited for other tasks. If you want it to output to PDF, you have to take a trip through LaTeX. LaTeX generates beautiful text, but there is a lot of overhead through this path.

One tool that some people have used at work and one I plan to use for a project I have coming up is wkhtmltopdf. It is quite a different entrant: it is a headless browser that generates PDF's with remarkable fidelity to the source HTML. It understands CSS and even Javascript--though I imagine you might want to go light on the Javascript for your purposes! From what I've seen, it's pretty fast, too. Yes, you could argue that QT is a bigger dependency than LaTeX, but it might give you better results.

To generate the source HTML, I could imagine using XSL transforms (my big go-to solution) and/or tools like pandoc.

charles17 · Post by **charles17** » Wed Jul 08, 2015 4:38 am

52midnight wrote:I'm currently preparing a large, multi-file document for publication, and intend making it available in three formats:

Sounds like generating a SAX stream from your single document source and having serializers for each output format. Have a look at Cocoon.

yngwin · Post by **yngwin** » Wed Jul 08, 2015 1:52 pm

Write your original in reStructuredText and use dev-python/sphinx (and dev-python/rst2pdf) to convert to various formats.

HTML, EPUB, PDF and automated publishing.

HTML, EPUB, PDF and automated publishing.

Re: HTML, EPUB, PDF and automated publishing.