Eureka! I Have a Workflow

Warning sign WARNING: Geeky HTML and Makefile stuff in this post will render your eyeballs glassy if you’re not into that sort of thing.

I have been trying for weeks to create a workflow that allows me to edit my manuscript in Markdown format and then, with a simple command, generates all of the document formats I need to publish my book. That way I can create review copies and send them out, get feedback and make the changes I want, and then do it again — all in a precisely correct, reproducible way. Yesterday, in a long session, I managed to get the tools to do what I want. Maybe sharing this will help someone someday?

The requirements I had were:

  1. No manual steps at all. Just edit the manuscript and type make to get all of the required output files, ready to print or upload to Amazon, Google, Apple, my own Kindle, or whatever.

  2. Allow me to tag text with classes in my manuscript so I can include things like a real title page in a title-page-y font, a haiku, emails or text messages formatted so they look like they would on a computer screen, or even a THE END at the end of my story that is more than just some bold text on a line by itself. I wanted control over my inline and block element classes so I could do custom CSS for these special things.

  3. Let me include fonts in the output files, minimized to just the glyphs I use in my manuscript and ready to be used by the eBook reader if the user wishes it.

I use Python Markdown to generate HTML from the Markdown formatted manuscript and Calibre‘s ebook-convert command to do the multi-format conversion, taking my HTML, CSS, and font files as input and generating EPUB, AZW3, PDF, and maybe MOBI format.

Even though I could have Calibre invoke it for me, I use the Python Markdown tool directly. I was having trouble getting Calibre to turn on the Markdown extensions I needed, and I just decided to Hell with it. Don’t get me wrong — Calibre is a wonderful tool, and I thank the obviously brilliant Kovid Goyal for creating it for us all to use. By all means, if you use Calibre, please consider donating some money or some code. The Markdown extensions I ended up needing for Dying to Live Forever were attr_list, smarty, fenced_code, and sane_lists. The output of the Python Markdown step is an HTML file with inline and block element classes I can style using my CSS to make everything look just the way I want it.

Some notes about Calibre, in case anyone wonders why I did things the way I did.

First, I couldn’t find a way to get Calibre to substitute my scene breaks with what I wanted. The HTML file I gave Calibre used simple <hr />, but nothing I did — and I tried for quite a while — seemed to be able to trick Calibre into substituting what I wanted, which was <div class="hr">&larr;&nbsp;&rarr;</div>. So I just brute-forced this using a global search-and-replace operation on the HTML before I gave it Calibre to handle the rest.

Second, as I mentioned before, I couldn’t get Calibre to use the extensions I wanted to pass to markdown_py, so I just brute forced that too.

After the Markdown conversion, I run ebook-convert for each file format I need. Today I have EPUB and AZW3 formats, but I expect to add PDF soon so I can give a dead trees copy of the book to some folks who don’t use eBook readers. Along the way, this dead trees format was very nice for marked up review copies with scribbles everywhere for comments and corrections.

This ZIP file contains the Makefile and CSS I used. I used Calibre version 2.4 and Python Markdown version 2.4 (coincidence?).

Tools for the Writer

I devoted the entirety of my work time yesterday to trying to solve my workflow problems. I failed.

Software bugs suck.

I want a completely reproducible process to go from edits to my Markdown formatted manuscript to eBook formats for Kindle, Google, iBook, Nook, Smashwords, and possibly others. I had been using pandoc to transform the Markdown into the various forms, using a bizarre sequence of complicated command lines in a Makefile. I still want to do this, but each of the processing steps seemed to me to be full of fatal flaws, fatal limitations, or fatal pain-in-the-ass. The key word here is of course “fatal.”

As wonderful as Kovid’s product is, and usable for many things, I found bugs in the latest (2.3) release of Calibre to be too much for me. Then I tried using Calibre’s command line stuff, which worked best with my Makefile scheme anyway. I failed — mostly because the processing steps in Calibre’s tools take any HTML class I specify and replace it either with Calibre’s own class name (e.g., “calibre42”) which is useless for me to build CSS rules for, or it would replace the class name with the class I specified in the first HTML div I put in my document — like I wanted all of my divs to be of that class. Even if I specified a class on the div it was overridden by this bug. Of course I could go debug Calibre (I have spent many weeks lately debugging Python code at work trying to get OpenStack to work for me, for example).

Instead, I decided to try something else. I tried using Python Markdown directly, instead of using Calibre to call it. This produces HTML as its output, which has nearly all of what I need. Putting the resulting HTML through Calibre’s converter to get EPUB or AZW3 suffered from the same bugs. But I need a way to take the correctly translated HTML from Python Markdown and split it into separate documents at chapter boundaries, to write the TOC and contents.opf files, etc.

So I downloaded and built Sigil, which I gather is the spiritual predecessor of Calibre in some way? It is written in C++ and is, therefore, a very long and painful build process. But Sigil works — for the things it does — very well. It just doesn’t do all of the things I need, and it’s not a command line thing, so I can’t use Makefiles to automate the process like I wanted.

I’m about to go write some Node.js code to do this. Javascript is a language I’m familiar with, and I have wanted a good project using it anyway.

But first, I need to manually build my eBook and get this fucking book published. Then I can build tools to replicate that process for revisions and future books. I think I’ll open source the result.

Slogging and blogging

Gawd this is a slog! I’m at the 63% point in my manuscript, according to my text editor. This is going slowly, but I feel good about it. The revisions are valuable — I’m making this much better. And I’m finding various small (usually) errors and inconsistencies in the story telling. I needed to do this.

Some people might be interested in the tool chain I’m using to finish this book. Most of the book has been written using these tools, although I started out — years ago, now — using what used to be called StarOffice back in the day.

I have always used Linux from the beginning. I hate Windows, and I didn’t get a Macbook until recently, and that is the property of my employer anyway. So, Linux it is. I use KDE on Ubuntu, and I’m pretty happy with it. I have eschewed fancy and distracting WYSIWYG word processors for something simpler and more contained.

The GNU Emacs logo I use GNU Emacs for editing the text itself, a GNU Makefile to drive the tools to transform this into the finished PDF and EBOOK formats, and various other tools as they’re needed. I couldn’t live without Emacs’ ispell package, and I have made good use of ediff to merge versions of the manuscript when I screwed up and edited them on two different computers and had to recover from that.

I have an ARM based Samsung Chromebook for when I’m working on the train or out and about. ChromeOS is wonderful for many things, but I wanted my Linux tools, so I installed crouton to get emacs and the other Linux goodies. I use a desktop PC (built it myself!) at home for the really heavy editing when I’m there.

After editing the text form, which is in Markdown format by the way, I pass the whole thing through pandoc to get HTML for one version and EPUB for another. The HTML goes through LibreOffice using an unfortunately still mostly manual process to create a PDF for someone who needs dead-trees copy to write on.

The EPUB version goes through Calibre to be transformed into the various eBook formats required by Google, Amazon, Apple, and SmashWords. These conversions are still somewhat buggy. Before I publish this thing I’m going to have to make them bullet-proof. I already have a very nice book design and cover, but getting the tools to generate a form that each of these eBook vendors can gobble up and feed to the various eBook readers while retaining some semblance of the original formatting is hard.

I’m learning by doing. I have new respect for the publishing houses and the companies that do all of these things as a service. When I’m rich someday I might ask someone else to do these jobs for me. For now, my time is free.